HIV envelope polypeptides

ABSTRACT

PCT No. PCT/US94/06036 Sec. 371 Date Oct. 10, 1995 Sec. 102(e) Date Oct. 10, 1995 PCT Filed Jun. 7, 1994 PCT Pub. No. WO94/28929 PCT Pub. Date Dec. 22, 1994A method for the rational design and preparation of vaccines based on HIV envelope polypeptides is described. In one embodiment, the method for making an HIV gp120 subunit vaccine for a geographic region comprises determining neutralizing epitopes in the V2 and/or C4 domains of gp120 of HIV as depicted in the figure. In a preferred embodiment of the method, neutralizing epitopes for the V2, V3 and C4 domains of gp120 are determined. Also described are DNA sequences encoding gp120 from preferred vaccine strains of HIV.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a 35 U.S.C. 371 PCT/US94/06036, filed Jun. 7, 1994,which is a continuation-in-part of 08/072,833, filed Jun. 7, 1993 nowabandoned.

FIELD OF THE INVENTION

This invention relates to the rational design and preparation of HIVvaccines based on HIV envelope polypeptides and the resultant vaccines.This invention further relates to improved methods for HIV serotypingand immunogens which induce antibodies useful in the serotyping methods.

BACKGROUND OF THE INVENTION

Acquired immunodeficiency syndrome (AIDS) is caused by a retrovirusidentified as the human immunodeficiency virus (HIV). There have beenintense effort to develop a vaccine. These efforts have focused oninducing antibodies to the HIV envelope protein. Recent efforts haveused subunit vaccines where an HIV protein, rather than attenuated orkilled virus, is used as the immunogen in the vaccine for safetyreasons. Subunit vaccines generally include gp120, the portion of theHIV envelope protein which is on the surface of the virus.

The HIV envelope protein has been extensively described, and the aminoacid and RNA sequences encoding HIV envelope from a number of HIVstrains are known (Myers, G. et al., 1992. Human Retroviruses and AIDS.A compilation and analysis of nucleic acid and amino acid sequences. LosAlamos National Laboratory, Los Alamos, N. Mex.). The HIV envelopeprotein is a glycoprotein of about 160 kd (gp160) which is anchored inthe membrane bilayer at its carboxyl terminal region. The N-terminalsegment, gp120, protrudes into the aqueous environment surrounding thevirion and the C-terminal segment, gp41, spans the membrane. Via ahost-cell mediated process, gp160 is cleaved to form gp120 and theintegral membrane protein gp41. As there is no covalent attachmentbetween gp120 and gp41, free gp120 is released from the surface ofvirions and infected cells.

The gp120 molecule consists of a polypeptide core of 60,000 daltonswhich is extensively modified by N-linked glycosylation to increase theapparent molecular weight of the molecule to 120,000 daltons. The aminoacid sequence of gp120 contains five relatively conserved domainsinterspersed with five hypervariable domains. The positions of the 18cysteine residues in the gp120 primary sequence, and the positions of 13of the approximately 24 N-linked glycosylation sites in the gp120sequence are common to all gp120 sequences. The hypervariable domainscontain extensive amino acid substitutions, insertions and deletions.Sequence variations in these domains result in up to 30% overallsequence variability between gp120 molecules from the various viralisolates. Despite this variation, all gp120 sequences preserve thevirus's ability to bind to the viral receptor CD4 and to interact withgp41 to induce fusion of the viral and host cell membranes.

gp120 has been the object of intensive investigation as a vaccinecandidate for subunit vaccines, as the viral protein which is mostlikely to be accessible to immune attack. gp120 is considered to be agood candidate for a subunit vaccine, because (i) gp120 is known topossess the CD4 binding domain by which HIV attaches to its targetcells, (ii) HIV infectivity can be neutralized in vitro by antibodies togp120, (iii) the majority of the in vitro neutralizing activity presentin the serum of HIV infected individuals can be removed with a gp120affinity column, and (iv) the gp120/gp41 complex appears to be essentialfor the transmission of HIV by cell-to-cell fusion.

The identification of epitopes recognized by virus neutralizingantibodies is critical for the rational design of vaccines effectiveagainst HIV-1 infection. One way in which antibodies would be expectedto neutralize HIV-1 infection is by blocking the binding of the HIV-1envelope glycoprotein, gp120, to its cellular receptor, CD4. However, ithas been surprising that the CD4 blocking activity, readily demonstratedin sera from HIV-1 infected individuals (31, 44) and animals immunizedwith recombinant envelope glycoproteins (1-3), has not always correlatedwith neutralizing activity (2, 31, 44). Results obtained with monoclonalantibodies have shown that while some of the monoclonal antibodies thatblock the binding of gp120 to CD4 possess neutralizing activity, othersdo not (4, 7, 16, 26, 33, 35, 43, 45). When the neutralizing activity ofCD4 blocking monoclonal antibodies are compared to those directed to theprincipal neutralizing determinant (PND) located in the third variabledomain (V3 domain) of gp120 (10, 39), the CD4 blocking antibodies appearto be significantly less potent. Thus, CD4 blocking monoclonalantibodies typically exhibit 50% inhibitory concentration values (IC₅₀)in the 1-10 μg/ml range (4, 16, 26, 33, 35, 43, 45) whereas PND directedmonoclonal antibodies typically exhibit IC₅₀ values in the 0.1 to 1.0μg/ml range (23, 33, 42).

Subunit vaccines, based on gp120 or another viral protein, that caneffectively induce antibodies that neutralize HIV are still beingsought. However, to date no vaccine has not been effective in conferringprotection against HIV infection.

DESCRIPTION OF THE BACKGROUND ART

Recombinant subunit vaccines are described in Berman et al.,PCT/US91/02250 (published as number WO91/15238 on 17 Oct. 1991). Seealso, e.g. Hu et al., Nature 328:721-724 (1987) (vaccinia virus-HIVenvelope recombinant vaccine); Arthur et al., J. Virol. 63(12):5046-5053 (1989) (purified gp120); and Berman et al., Proc. Natl. Acad.Sci. USA 85:5200-5204 (1988) (recombinant envelope glycoprotein gp120).

Numerous sequences for gp120 are known. The sequence of gp120 from theIII substrain of HIV-1_(LAI) referred to herein is that determined byMuesing et al., "Nucleic acid structure and expression of the humanAIDS/lymphadenopathy retrovirus, Nature 313:450-458 (1985). Thesequences of gp120 from the NY-5, Jrcsf, Z6, Z321, and HXB2 strains ofHIV-1 are listed by Myers et al., "Human Retroviruses and AIDS; Acompilation and analysis of nucleic acid and amino acid sequences," LosAlamos National Laboratory, Los Alamos, N. Mex. (1992). The sequence ofthe Thai isolate A244 is provided by McCutchan et al., "Genetic Variantsof HIV-1 in Thailand," AIDS Res. and Human Retroviruses 8:1887-1895(1992). The MN₁₉₈₄ clone is described by Gurgo et al., "Envelopesequences of two new United States HIV-1 isolates," Virol. 164: 531-536(1988). The amino acid sequence of this MN clone differs byapproximately 2% from the MN-gp120 clone (MN_(GNE)) disclosed herein andobtained by Berman et al.

Each of the above-described references is incorporated herein byreference in its entirety.

SUMMARY OF THE INVENTION

The present invention provides a method for the rational design andpreparation of vaccines based on HIV envelope polypeptides. Thisinvention is based on the discovery that there are neutralizing epitopesin the V2 and C4 domains of gp120 , in addition to the neutralizingepitopes in the V3 domain. In addition, the amount of variation of theneutralizing epitopes is highly constrained, facilitating the design ofan HIV subunit vaccine that can induce antibodies that neutralize aplurality of HIV strains for a given geographic region.

In one embodiment, the present invention provides a method for making anHIV gp120 subunit vaccine for a geographic region in which aneutralizing epitope in the V2 and/or C4 domains of gp120 of HIVisolates from the geographic region is determined and an HIV strainhaving gp120 which has a neutralizing epitope in the V2 or C4 domainwhich is common among isolates in the geographic region is selected andused to make the vaccine.

In a preferred embodiment of the method, neutralizing epitopes for theV2, V3, and C4 domains of gp120 from HIV isolates from the geographicregion are determined. At least two HIV isolates having differentneutralizing epitopes in the V2, V3, or C4 domain are selected and usedto make the HIV gp120 subunit vaccine. Preferably, each of the selectedisolates have one of the most common neutralizing epitopes for the V2,V3, or C4 domains.

The invention also provides a multivalent HIV gp120 subunit vaccine. Thevaccine comprises gp120 from two isolates of HIV having at least onedifferent neutralizing epitope. Preferably, the isolates have the mostcommon neutralizing epitopes in the geographic region for one of thedomains.

A DNA sequence of less than 5 kilobases encoding gp120 from preferredvaccine strains of HIV, GNE₈ and GNE₁₆, expression construct comprisingthe GNE₈ -gp120 and GNE₁₆ -gp120 encoding DNA under the transcriptionaland translational control of a heterologous promoter, and isolated GNE₈-gp120 and GNE₁₆ -gp120 are also provided. The invention furtherprovides improved methods for HIV serotyping in which epitopes in the V2or C4 domains of gp120 are determined and provides immunogens (truncatedgp120 sequences) which induce antibodies useful in the serotypingmethods.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 describes inhibition of CD4 binding by monoclonal antibodies torecombinantly produced gp120 from the MN strain of HIV (MN-rgp120). Micewere immunized with MN-rgp120 and the resulting splenocytes were fusedwith the NP3X63.Ag8.653 cell line as described in Example 1. Thirty-fivestable hybridoma clones, reactive with MN-rgp120 were identified byELISA. Secondary screening revealed seven cell lines (1024, 1093, 1096,1097, 1110, 1112, and 1027) secreting antibodies able to inhibit thebinding of MN-rgp120 to biotin labeled recombinantly produced CD4(rsCD4)in a ELISA using HRPO-strepavadin. Data obtained with monoclonalantibodies from the same fusion (1026, 1092, 1126) that failed toinhibit MN-rgp120 binding to CD4 is shown for purposes of comparison.

FIG. 2 shows neutralizing activity of CD4-blocking monoclonal antibodiesto MN-rgp120. Monoclonal antibodies that blocked the binding ofMN-rgp120 to CD4 were screened for the capacity to inhibit the infectionof MT2 cells by the MN strain of HIV-1 in vitro. Cell free virus wasadded to wells containing serially diluted antibodies and incubated at4° C. for 1 hr. After incubation, MT-2 cells were added to the wells andthe cultures were then grown for 5 days at 37° C. Cell viability wasthen measured by addition of the colorimetric tetrazolium compound MTTas described in reference (35) of Example 1. The optical densities ofeach well were measured at 540 nm using a microtiter plate readingspectrophotometer. Inhibition of virus infectivity was calculated bydividing the mean optical densities from wells containing monoclonalantibodies by the mean value of wells that received virus alone.Monoclonal antibodies that blocked CD4 binding are the same as thoseindicated in Figure Legend 1. Data from the V3-directed monoclonalantibody to MN-rgp120 (1034) is provided as a positive control. Dataobtained with the V3 directed monoclonal antibody, 11G5, specific forthe IIIB strain of HIV-1 (33) is shown as a negative control.

FIGS. 3A-3B are diagrams of gp120 fragments used to localize theepitopes recognized by the CD4 blocking monoclonal antibodies toMN-rgp120 . A series of fragments (A) corresponding to the V4 and C4domains (B) (SEQ. ID. NO. 14) of the gene encoding MN-rgp120 wereprepared by PCR. The gp120 gene fragments were fused to a fragment ofthe gene encoding Herpes Simplex Virus Type 1 glycoprotein D thatencoded the signal sequence and 25 amino acids from the mature aminoterminus. The chimeric genes were assembled into a mammalian cellexpression vector (PRK5) that provided a CMV promoter, translationalstop codons and an SV40 polyadenylation site. The embryonic human kidneyadenocarcinoma cell line, 293s, was transfected with the resultingplasmid and recombinant proteins were recovered from growth conditionedcell culture medium. Fragments of MN-rgp120, expressed as HSV-1 Gdfusion proteins, were produced by transient transfection of 293s cells(Example 1). To verify expression, cells were metabolically labeled with³⁵ S!-methionine, and the resulting growth conditioned cell culturesupernatants were immunoprecipitated (C) using a monoclonal antibody,5B6, specific for the amino terminus of HSV-1 Gd and fixed S. aureus.The immunoprecipitated proteins were resolved on 4 to 20% acrylamidegradient gels using SDS-PAGE and visualized by autoradiography. Thesamples were: Lane 1, FMN.368-408; lane 2, FMN.368-451; lane 3,FMN.419-443; lane 4, FMN.414-451; lane 5, MN-rgp120. The geldemonstrated that the proteins were expressed and migrated at theexpected molecular weights.

FIG. 4 shows a C4 domain sequence comparison (SEQ. ID. Nos. 3-13). TheC4 domain amino acid sequences of recombinant and virus derived gp120sused for monoclonal antibody binding studies were aligned starting theamino terminal cysteine. Amino acid positions are designated withrespect to the sequence of MN-rgp120. Sequences of the LAI substrains,IIIB, BH10, Bru, HXB2, and HXB3 are shown for purposes of comparison.

FIG. 5 shows sequences of C4 domain mutants of MN-rgp120 (SEQ. ID. Nos.3 and 15-23). Nucleotide substitutions, resulting in the amino acidsequences indicated, were introduced into the C4 domain of MN-rgp120gene using recombinant PCR. The resulting variants were assembled intothe expression plasmid, pRK5, which was then transfected into 293scells. The binding of monoclonal antibodies to the resulting C4 domainvariants was then analyzed (Table 5) by ELISA.

FIG. 6 illustrates the reactivity of monoclonal antibody 1024 withHIV-1_(LAI) substrains. The cell surface binding of the C4 domainreactive monoclonal antibody 1024 to H9 cells chronically infected withthe IIIB, HXB2, HXB3, and HXB10 substrains of HIV-1 LAI or HIV-1MN wasanalyzed by flow cytometry. Cultures of virus infected cells werereacted with either monoclonal antibody 1024, a nonrelevant monoclonalantibody (control), or a broadly cross reactive monoclonal antibody(1026) raised against rgp120. After washing away unbound monoclonalantibody, the cells were then labeled with fluorescein conjugated goatantibody to mouse IgG (Fab')₂, washed and fixed with paraformaldehyde.The resulting cells were analyzed for degree of fluorescence intensityusing a FACSCAN (Becton Dickenson, Fullerton, Calif.). Fluorescence wasmeasured as mean intensity of the cells expressed as mean channel numberplotted on a log scale.

FIGS. 7A-7D shows the determination of the binding affinity ofmonoclonal antibodies for MN-rgp120. CD4blocking monoclonal antibodiesraised against MN-rgp120 (1024 and 1097) or IIIB-rgp120 (13H8 and 5C2)were labeled with ¹²⁵ I! and binding titrations using MN-rgp120 (A andB) or IIIB-rgp120 (C and D) were carried out as described in theExample 1. A, binding of monoclonal antibody 1024; B binding ofmonoclonal antibody 1097; C, binding of monoclonal antibody 13H8; and Dbinding of monoclonal antibody 5C2.

FIG. 8 shows the correlation between gp120binding affinity (K_(d)) andneutralizing activity (IC50) of monoclonal antibodies to the C4 domainof MN-rgp120. Binding affinities of monoclonal antibodies to the C4domain of gp120 were determined by Scatchard analysis (FIG. 9, Table 5).The resulting values were plotted as a function of the log of theirneutralizing activities (IC₅₀) determined in FIG. 2 and Table 6.

FIG. 9 depicts the amino acid sequence of the mature envelopeglycoprotein (gp120) from the MN_(GNE) clone of the MN strain of HIV-1(SEQ. ID. NO. 1). Hypervariable domains are from 1-29 (signal sequence),131-156, 166-200,305-332, 399-413, and 460-469. The V and C regions areindicated (according to Modrow et al., J. Virology 61(2):570 (1987).Potential glycosylation sites are marked with a (*).

FIG. 10 depicts the amino acid sequence of a fusion protein of theresidues 41-511 of the mature envelope glycoprotein (gp120) from theMN_(GNE) clone of the MN strain of HIV-1, and the gD-1 amino terminusfrom the herpes simplex glycoprotein gD-1. (SEQ. ID. NO. 2). The V and Cregions are indicated (according to Modrow et al., J. Virology 61(2):570(1987). Potential glycosylation sites are marked with a (*).

DETAILED DESCRIPTION OF THE INVENTION

The present invention provides a method for the rational design andpreparation of vaccines based on HIV envelope polypeptides. Thisinvention is based on the discovery that there are neutralizing epitopesin the V2 and C4 domains of gp120, in addition to the neutralizingepitopes in the V3 domain. Although the amino acid sequences of theneutralizing epitopes in the V2, V3, and C4 domains are variable, it hasnow been found that the amount of variation is highly constrained. Thelimited amount of variation facilitates the design of an HIV subunitvaccine that can induce antibodies that neutralize the most common HIVstrains for a given geographic region. In particular, the amino acidsequence of neutralizing epitopes in the V2, V3, and C4 domains forisolates of a selected geographic region is determined. gp120 fromisolates having the most common neutralizing epitope sequences areutilized in the vaccine.

The invention also provides a multivalent gp120subunit vaccine whereingp120 present in the vaccine is from at least two HIV isolates whichhave different amino acid sequences for a neutralizing epitope in theV2, V3, or C4 domain of gp120. The invention further provides improvedmethods for HIV serotyping in which epitopes in the V2 or C4 domains ofgp120 are determined and provides immunogens which induce antibodiesuseful in the serotyping methods.

The term "subunit vaccine" is used herein, as in the art, to refer to aviral vaccine that does not contain virus, but rather contains one ormore viral proteins or fragments of viral proteins. As used herein, theterm "multivalent" means that the vaccine contains gp120 from at leasttwo HIV isolates having different amino acid sequences for aneutralizing epitope.

Vaccine Design Method

The vaccine design method of this invention is based on the discoverythat there are neutralizing epitopes in the V2 and C4 domains of gp120,in addition to those found in the principal neutralizing domain (PND) inthe V3 domain. Selecting an HIV isolate with appropriate neutralizingepitopes in the V2 and/or C4 domains provides a vaccine that is designedto induce immunity to the HIV isolates present in a selected geographicregion. In addition, although the amino acid sequence of the V2, V3, andC4 domains containing the neutralizing epitopes is variable, the amountof variation is highly constrained, facilitating the design of amultivalent vaccine which can neutralize a plurality of the most commonHIV strains for a given geographic region.

The method for making an HIV gp120 subunit vaccine depends on the use ofappropriate strains of HIV for a selected geographic region. Appropriatestrains of HIV for the region are selected by determining theneutralizing epitopes for HIV isolates and the percentage of HIVinfections attributable to each strain present in the region. HIVstrains which have the most common neutralizing epitopes in the V2or C4domains in the geographic region are selected. Preferably, isolates thatconfer protection against the most common neutralizing epitopes in theV2, V3, and C4 domains for a geographic region are selected.

One embodiment of the method for making an HIV gp120 subunit vaccinefrom appropriate strains of HIV for a geographic region comprises thefollowing steps. A neutralizing epitope in the V2 or C4 domain of gp120of HIV isolates from the geographic region is determined. An HIV strainhaving gp120 with a neutralizing epitope in the V2 or C4 domain that iscommon among HIV isolates in the geographic region is selected. gp120from the selected isolate is used to make an HIV gp120 subunit vaccine.

In another embodiment of the method, the neutralizing epitopes in theV2, V3, and C4 domains of gp120 from HIV isolates from the geographicregion are determined. At least two HIV isolates having differentneutralizing epitopes in the V2, V3, or C4 domain are selected and usedto make an HIV gp120 subunit vaccine. Preferably, the vaccine containsgp120 from at least the two or three HIV strains having the most commonneutralizing epitopes for the V2, V3, or C4 domains. More preferably,the vaccine contains gp120 from sufficient strains so that at leastabout 50%, preferably about 70%, more preferably about 80% or more ofthe neutralizing epitopes for the V2, V3, and C4 domains in thegeographic region are included in the vaccine. The location of theneutralizing epitopes in the V3 region are well known. The location ofthe neutralizing epitopes in the V2 and C4 regions are describedhereinafter.

Each of the steps of the method are described in detail below.

Determining neutralizing epitopes

The first step in designing a vaccine for a selected geographic regionis to determine the neutralizing epitopes in the gp120 V2 and/or C4domains. In a preferred embodiment, neutralizing epitopes in the V3domain (the principal neutralizing domain) are also determined. Thelocation of neutralizing epitopes in the V3 domain is well known.Neutralizing epitopes in the V2 and C4 domains have now been found to belocated between about residues 163 and 200 and between about residues420 and 440, respectively. In addition, the critical residues forantibody binding are residues 171, 173, 174, 177, 181, 183, 187, and 188in the V2 domain and residues 429 and 432 in the C4 domain, as describedin detail in the Examples.

The neutralizing epitopes for any isolate can be determined bysequencing the region of gp120 containing the neutralizing epitope.Alternatively, when antibodies specific for the neutralizing epitope,preferably monoclonal antibodies, are available the neutralizing epitopecan be determined by serological methods as described hereinafter. Amethod for identification of additional neutralizing epitopes in gp120is described hereinafter.

When discussing the amino acid sequences of various isolates and strainsof HIV, the most common numbering system refers to the location of aminoacids within the gp120 protein using the initiator methionine residue asposition 1. The amino acid numbering reflects the mature HIV-1 gp120amino acid sequence as shown by FIG. 9 and FIG. 10 SEQ. ID Nos. 1 and2!. For gp120 sequences derived from other HIV isolates and whichinclude their native HIV N-terminal signal sequence, numbering maydiffer. Although the nucleotide and amino acid residue numbers may notbe applicable in other strains where upstream deletions or insertionschange the length of the viral genome and gp120, the region encoding theportions of gp120 is readily identified by reference to the teachingsherein. The variable (V) domains and conserved (C) domains of gp120 arespecified according to the nomenclature of Modrow et al."Computer-assisted analysis of envelope protein sequences of seven humanimmunodeficiency virus isolates: predictions of antigenic epitopes inconserved and variable regions," J. Virol. 61:570-578 (1987).

The first step in identifying the neutralizing epitopes for any regionof gp120 is to immunize an animal with gp120 to induce anti-gp120antibodies. The antibodies can be polyclonal or, preferably, monoclonal.Polyclonal antibodies can be induced by administering to the host animalan immunogenic composition comprising gp120. Preparation of immunogeniccompositions of a protein may vary depending on the host animal and theprotein and is well known. For example, gp120 or an antigenic portionthereof can be conjugated to an immunogenic substance such as KLH or BSAor provided in an adjuvant or the like. The induced antibodies can betested to determine whether the composition is specific for gp120. If apolyclonal antibody composition does not provide the desiredspecificity, the antibodies can be fractionated by ion exchangechromatography and immunoaffinity methods using intact gp120 or variousfragments of gp120 to enhance specificity by a variety of conventionalmethods. For example, the composition can be fractionated to reducebinding to other substances by contacting the composition withgp120affixed to a solid substrate. Those antibodies which bind to thesubstrate are retained. Fractionation techniques using antigens affixedto a variety of solid substrates such as affinity chromatographymaterials including Sephadex, Sepharose and the like are well known.

Monoclonal anti-gp120 antibodies can be produced by a number ofconventional methods. A mouse can be injected with an immunogeniccomposition containing gp120 and spleen cells obtained. Those spleencells can be fused with a fusion partner to prepare hybridomas.Antibodies secreted by the hybridomas can be screened to select ahybridoma wherein the antibodies neutralize HIV infectivity, asdescribed hereinafter. Hybridomas that produce antibodies of the desiredspecificity are cultured by standard techniques.

Infected human lymphocytes can be used to prepare human hybridomas by anumber of techniques such as fusion with a murine fusion partner ortransformation with EBV. In addition, combinatorial libraries of humanor mouse spleen can be expressed in E. coli to produce the antibodies.Kits for preparing combinatorial libraries are commercially available.Hybridoma preparation techniques and culture methods are well known andconstitute no part of the present invention. Exemplary preparations ofmonoclonal antibodies are described in the Examples.

Following preparation of anti-gp120 monoclonal antibodies, theantibodies are screened to determine those antibodies which areneutralizing antibodies. Assays to determine whether a monoclonalantibody neutralizes HIV infectivity are well known and are described inthe literature. Briefly, dilutions of antibody and HIV stock arecombined and incubated for a time sufficient for antibody binding to thevirus. Thereafter, cells that are susceptible to HIV infection arecombined with the virus/antibody mixture and cultured. MT-2 cells or H9cells are susceptible to infection by most HIV strains that are adaptedfor growth in the laboratory. Activated peripheral blood mononuclearcells (PBMCs) or macrophages can be infected with primary isolates(isolates from a patient specimens which have not been cultured inT-cell lines or transformed cell lines). Daar et al, Proc. Natl. Acad.Sci. USA 87:6574-6578 (1990) describe methods for infecting cells withprimary isolates.

After culturing the cells for about five days, the number of viablecells is determined, as by measuring metabolic conversion of theformazan MTT dye. The percentage of inhibition of infectivity iscalculated to determine those antibodies that neutralize HIV. Anexemplary preferred procedure for determining HIV neutralization isdescribed in the Examples.

Those monoclonal antibodies which neutralize HIV are used to map theepitopes to which the antibodies bind. To determine the location of agp120 neutralizing epitope, neutralizing antibodies are combined withfragments of gp120 to determine the fragments to which the antibodiesbind. The gp120fragments used to localize the neutralizing epitopes arepreferably made by recombinant DNA methods as described hereinafter andexemplified in the Examples. By using a plurality of fragments, eachencompassing different, overlapping portions of gp120, an amino acidsequence encompassing a neutralizing epitope to which a neutralizingantibody binds can be determined. A preferred exemplary determination ofthe neutralizing epitopes to which a series of neutralizing antibodiesbinds is described in detail in the Examples.

This use of overlapping fragments can narrow the location of the epitopeto a region of about 20 to 40 residues. To confirm the location of theepitope and narrow the location to a region of about 5 to 10 residues,site-directed mutagenicity studies are preferably performed. Suchstudies can also determine the critical residues for binding ofneutralizing antibodies. A preferred exemplary site-directedmutagenicity procedure is described in the Examples.

To perform site-directed mutagenicity studies, recombinant PCRtechniques can be utilized to introduce single amino acid substitutionsat selected sites into gp120 fragments containing the neutralizingepitope. Briefly, overlapping portions of the region containing theepitope are amplified using primers that incorporate the desirednucleotide changes. The resultant PCR products are annealed andamplified to generate the final product. The final product is thenexpressed to produce a mutagenized gp120 fragment. Expression of DNAencoding gp120 or a portion thereof is described hereinafter andexemplified in the Examples.

In a preferred embodiment described in Example 1, the gp120 fragmentsare expressed in mammalian cells that are capable of expression of gp120fragments having the same glycolsylation and disulfide bonds as nativegp120. The presence of proper glycolsylation and disulfide bondsprovides fragments that are more likely to preserve the neutralizingepitopes than fragments that are expressed in E. coli, for example,which lack disulfide bonds and glycosylation or are chemicallysynthesized which lack glycolsylation and may lack disulfide bonds.

Those mutagenized gp120 fragments are then used in an immunoassay usinggp120 as a control to determine the mutations that impair or eliminatebinding of the neutralizing antibodies. Those critical amino acidresidues form part of the neutralizing epitope that can only be alteredin limited ways without eliminating the epitope. Each alteration thatpreserves the epitope can be determined. Such mutagenicity studiesdemonstrate the variations in the amino acid sequence of theneutralizing epitope that provide equivalent or diminished binding byneutralizing antibodies or eliminate antibody binding. Although theamino acid sequence of gp120 used in the vaccine preferably is identicalto that of a selected HIV isolate for the given geographic region,alterations in the amino acid sequence of neutralizing epitope that aresuitable for use in a vaccine can be determined by such studies.

Once a neutralizing epitope is localized to a region of ten to twentyamino acids of gp120, the amino acid sequence of correspondingneutralizing epitopes of other HIV isolates can be determined byidentifying the corresponding portion of the gp120 amino acid sequenceof the isolate.

Once the neutralizing epitopes for a given region of gp120 aredetermined, the amino acid sequence of HIV isolates for the geographicregion are determined. The complete amino acid sequence for numerousisolates has been determined and is available from numerous journalarticles and in databases. In such cases, determination of the aminoacid sequence of HIV isolates for the geographic region involves lookingup the sequence in an appropriate database or journal article. However,for some isolates, the amino acid sequence information does not includethe sequence of the V2 or C4 domains.

When the amino acid sequence of a region of interest for a given isolateis not known, the amino acid sequence can be determined by well knownmethods. Methods for determining the amino acid sequence of a protein orpeptide of interest are well known and are described in numerousreferences including Maniatis et al., Molecular Cloning--A LaboratoryManual, Cold Spring Harbor Laboratory (1984). In addition, automatedinstruments which sequence proteins are commercially available.

Alternatively, the nucleotide sequence of DNA encoding gp120 or arelevant portion of gp120 can be determined and the amino acid sequenceof gp120 can be deduced. Methods for amplifying gp120-encoding DNA fromHIV isolates to provide sufficient DNA for sequencing are well known. Inparticular, Ou et al, Science 256:1165-1171 (1992); Zhang et al. AIDS5:675-681 (1991); and Wolinsky Science 255:1134-1137 (1992) describemethods for amplifying gp120 DNA. Sequencing of the amplified DNA iswell known and is described in Maniatis et al., Molecular Cloning--ALaboratory Manual, Cold Spring Harbor Laboratory (1984), and Horvath etal., An Automated DNA Synthesizer Employing Deoxynucleoside3'-Phosphoramidites, Methods in Enzymology 154: 313-326, (1987), forexample. In addition, automated instruments that sequence DNA arecommercially available.

In a preferred embodiment, the isolate is a patient isolate which hasnot been passaged in culture. It is known that following passage inT-cells, HIV isolates mutate and isolates best suited for growth undercell culture conditions are selected. For example, cell culture strainsof HIV develop the ability to form syncytia. Therefore, preferably theamino acid sequence of gp120 is determined from a patient isolate priorto growth in culture. Generally, DNA from the isolate is amplified toprovide sufficient DNA for sequencing. The deduced amino acid sequenceis used as the amino acid sequence of the isolate, as describedhereinbefore.

To determine the percentage each isolate constitutes of total HIV thatinfects individuals in the geographic region, standard epidemiologicalmethods are used. In particular, sufficient isolates are sequenced toensure confidence that the percentage of each isolate in the geographicregion has been determined. For example, Ichimura et al, AIDS Res. Hum.Retroviruses 10:263-269 (1994) describe an epidemiological study inThailand that determined that there are two strains of HIV present inthe region. HIV strains have only recently been present in Thailand andThailand, therefore has the most homogenous population of HIV isolatesknown to date. The study sequenced 23 isolates from various parts of thecountry and determined that only two different amino acid sequences werepresent in the isolates.

In contrast, HIV has been infecting individuals in Africa for thelongest period of any geographic region. In Africa, each of the mostcommon isolates probably constitutes about 5% of the population. In suchcases, more isolates would need to be sequenced to determine thepercentage each isolate constitutes of the population. Populationstudies for determining the percentage of various strains of HIV, orother viruses, present in a geographic region are well known and aredescribed in, for example, Ou et al, Lancet 341:1171-1174 (1993); Ou etal, AIDS Res. Hum. Retroviruses 8:1471-1472 (1992); and McCutchan etal., AIDS Res. Hum. Retroviruses 8:1887-1895 (1992).

In the United States and western Europe, probably about two to fourdifferent neutralizing epitopes in each of the V2, V3, and C4 domainsconstitute 50 to 70% of the neutralizing epitopes for each domain in thegeographic region, as described more fully hereinafter.

Selection method

Once the amino acid sequence of neutralizing epitopes for strains in aregion are determined, gp120 from an HIV strain having gp120 that has anamino acid sequence for a neutralizing epitope in the V2 or C4 domainwhich sequence is one of the most common in the geographic region isselected. One of the most common neutralizing epitope amino acidsequences means that the strain has an amino acid sequence for at leastone neutralizing epitope that is occurs among the most frequently forHIV isolates in the geographic region and thus is present as asignificant percentage of the population. For example, if there arethree sequences for a neutralizing epitope that constitute 20, 30, and40 percent of the sequences for that epitope in the region and theremainder of the population is comprised by 2 to 4 other sequences, thethree sequences are the most common. Therefore, in African countries, ifeach of several amino acid sequences constitute about 5% of thesequences for a neutralizing epitope and the remainder of the sequenceseach constitute less than 1% of the population, the isolates thatconstitute 5% of the population are the most common.

Preferably, isolates having the most common amino acid sequences for aneutralizing epitope are chosen. By the most common is meant that thesequences occur most frequently in the geographic region. For example,in the United States, the MN isolate has a C4 neutralizing epitope thatcomprises at least about 45% of the population. The GNE₈ isolate has aC4 neutralizing epitope that comprises at least about 45% of thepopulation. Thus either isolate has the most common C4 neutralizingepitope in the region. When gp120 from each isolate is combined in avaccine, greater than about 90% of the C4 neutralizing epitope sequencesare present in the vaccine. In addition, the amino acid sequences forthe V3 neutralizing epitope in the MN and GNE₈ isolates aresubstantially similar and comprise about 60% of the population.Therefore, those strains have the two most common neutralizing epitopesfor the V3 domain. In the V2 region, the MN isolate amino acid sequencescomprises about 10% of the population, and the GNE₈ isolate amino acidsequences comprises about 60% of the population. Therefore, the GNE₈strain has the most common neutralizing epitope for the region and thetwo strains together comprise the two most common neutralizing epitopesfor the region. A multivalent gp120 subunit vaccine containing the twoisolates contains amino acid sequences for epitopes that constituteabout 70% of the V2 domain, about 60% of the V3 domain, and about 90% ofthe C4 domain for the United States.

In a preferred embodiment of the method, one or more HIV isolates havingan amino acid sequence for a neutralizing epitope in the V2 and/or C4domains that constitute at least about 50% of the population for aselected geographic region are selected. In a more preferred embodiment,isolates having the most common neutralizing epitopes in the V3 domainare also included in the vaccine.

As is clear, once the most common amino acid sequences for theneutralizing epitopes in the V2, V3, and C4 domains are known, anisolate having a common epitope for each region is preferably selected.That is, when only two or three isolates are used for the vaccine, it ispreferable to select the isolate for common epitopes in each region,rather than selecting an isolate by analysis of a single region.

In a more preferred embodiment, gp120 from isolates having epitopes thatconstitute at least 50% of the population for the geographic region forV2, V3, and C4 domains are present in the vaccine. More preferably, theisolates have epitopes that constitute at least 60% of the populationfor the geographic region for the three domains. Most preferably, 70% ormore are included.

In another preferred embodiment, the entire amino acid sequence of theV2 and C4 domains is determined in the selection process. In addition toselecting common sequences for the neutralizing epitopes, isolateshaving unusual polymorphisms elsewhere in the region are preferably notused for the vaccine isolates.

Vaccine preparation

gp120 from the selected HIV isolate(s) is used to make a subunitvaccine, preferably a multivalent subunit vaccine. Preparation of gp120for use in a vaccine is well known and is described hereinafter. Withthe exception of the use of the selected HIV isolate, the gp120 subunitvaccine prepared in the method does not differ from gp120 subunitvaccines of the prior art.

As with prior art gp120 subunit vaccines, gp120 at the desired degree ofpurity and at a sufficient concentration to induce antibody formation ismixed with a physiologically acceptable carrier. A physiologicallyacceptable carrier is nontoxic to a recipient at the dosage andconcentration employed in the vaccine. Generally, the vaccine isformulated for injection, usually intramuscular or subcutaneousinjection. Suitable carriers for injection include sterile water, butpreferably are physiologic salt solutions, such as normal saline orbuffered salt solutions such as phosphate buffered saline or ringer'slactate. The vaccine generally contains an adjuvant. Useful adjuvantsinclude QS21 which stimulates cytotoxic T-cells and alum (aluminumhydroxide adjuvant). Formulations with different adjuvants which enhancecellular or local immunity can also be used.

Addition excipients that can be present in the vaccine include lowmolecular weight polypeptides (less than about 10 residues), proteins,amino acids, carbohydrates including glucose or dextrans, chelatingagents such as EDTA, and other excipients.

The vaccine can also contain other HIV proteins. In particular, gp41 orthe extracellular portion of gp41 can be present in the vaccine. Sincegp41 has a conserved amino acid sequence, the gp41 present in thevaccine can be from any HIV isolate. gp160 from an isolate used in thevaccine can replace gp120 in the vaccine or be used together with gp120from the isolate. Alternatively, gp160 from an isolate having adifferent neutralizing epitope than those in the vaccine isolates canadditionally be present in the vaccine.

Vaccine formulations generally include a total of about 300 to 600 μg ofgp120, conveniently in about 1.0 ml of carrier. The amount of gp120 forany isolate present in the vaccine will vary depending on theimmunogenicity of the gp120. For example, gp120 from the Thai strains ofHIV are much less immunogenic than gp120 from the MN strain. If the twostrains were to be used in combination, empirical titration of theamount of each virus would be performed to determine the percent of thegp120 of each strain in the vaccine. For isolates having similarimmunogenicity, approximately equal amounts of each isolate's gp120wouldbe present in the vaccine. For example, in a preferred embodiment, thevaccine includes gp120 from the MN, GNE₈, and GNE₁₆ strains atconcentrations of about 300 μg per strain in about 1.0 ml of carrier.Methods of determining the relative amount of an immunogenic protein inmultivalent vaccines are well known and have been used, for example, todetermine relative proportions of various isolates in multivalent poliovaccines.

The vaccines of this invention are administered in the same manner asprior art HIV gp120 subunit vaccines. In particular, the vaccines aregenerally administered at 0, 1, and at 6, 8 or 12 months, depending onthe protocol. Following the immunization procedure, annual or bi-annualboosts can be administered. However, during the immunization process andthereafter, neutralizing antibody levels can be assayed and the protocoladjusted accordingly.

The vaccine is administered to uninfected individuals. In addition, thevaccine can be administered to seropositive individuals to augmentimmune response to the virus, as with prior art HIV vaccines. It is alsocontemplated that DNA encoding the strains of gp120 for the vaccine canbe administered in a suitable vehicle for expression in the host. Inthis way, gp120 can be produced in the infected host, eliminating theneed for repeated immunizations. Preparation of gp120 expression vehicles is described hereinafter.

Production of qp120

gp120 in the vaccine can be produced by any suitable means, as withprior art HIV gp120 subunit vaccines. Recombinantly-produced orchemically synthesized gp120 is preferable to gp120 isolated directlyfrom HIV for safety reasons. Methods for recombinant production of gp120are described below.

DNA Encoding GNE₈ and GNE₁₆ gp120 and the resultant proteins

The present invention also provides novel DNA sequences encoding gp120from the GNE₈ and GNE₁₆ isolates which can be used to express gp120 andthe resultant gp120 proteins. A nucleotide sequence of less than about 5kilobases (Kb), preferably less than about 3 Kb having the nucleotidesequence illustrated in Tables 1 and 2, respectively, encodes gp120 fromthe GNE₈ and GNE₁₆ isolates. The sequences of the genes and the encodedproteins are shown below in Tables 1-3. In particular, Table 1illustrates the nucleotide sequence (SEQ. ID. NO. 27) and the predictedamino acid sequence (SEQ. ID. NO. 28) of the GNE₈ isolate of HIV. Theupper sequence is the coding strand. The table also illustrates thelocation of each of the restriction sites.

    TABLE 1      -      hgiCI           ban1    scfI           bsp1286    pstI           bmyI styI  scfI bsgI      1 ATGATAGTGA AGGGGATCAG GAAGAATTGT CAGCACTTGT GGAGATGGGG CACCATGCTC     CTTGGGATGT TGATGATCTG TAGTGCTGCA GAAAAATTGT       TACTATCACT TCCCCTAGTC CTTCTTAACA GTCGTGAACA CCTCTACCCC GTGGTACGAG     GAACCCTACA ACTACTAGAC ATCACGACGT CTTTTTAACA      1 M  I  V  K G  I  R  K  N  C   Q  H  L  W R  W  G   T  M  L   L  G  M     L M  I  C   S  A  A   E  K  L      W                                            kpnI        hgiCI        banI        asp718        acc65I      ndeI      101 GGGTCACAGT CTATTATGGG GTACCTGTGT GGAAAGAAGC AACCACCACT CTATTTTGTG     CATCAGATGC TAAAGCATAT GATACAGAGG TACATAATGT       CCCAGTGTCA GATAATACCC CATGGACACA CCTTTCTTCG TTGGTGGTGA GATAAAACAC     GTAGTCTACG ATTTCGTATA CTATGTCTCC ATGTATTACA      35 V  T  V   Y  Y  G   V  P  V  W K  E  A   T  T  T   L  F  C  A S  D     A   K  A  Y   D  T  E  V H  N  V                                                       nspI       nspI       nspHI       nspHI      apoI aflIII      201 TTGGGCCACA CATGCCTGTG TACCCACAGA CCCCAACCCA CAAGAAATAG GATTGGAAAA     TGTAACAGAA AATTTTAACA TGTGGAAAAA TAACATGGTA       AACCCGGTGT GTACGGACAC ATGGGTGTCT GGGGTTGGGT GTTCTTTATC CTAACCTTTT     ACATTGTCTT TTAAAATTGT ACACCTTTTT ATTGTACCAT      68 W  A  T   H  A  C  V P  T  D  P  N  P Q  E  I  G L  E  N   V  T  E     N  F  N  M W  K  N   N  M  V                                                   ppu10I       nsiI/avaIII   hindIII   draIII  ahaIII/draI      301 GAACAGATGC ATGAGGATAT AATCAGTTTA TGGGATCAAA GCTTAAAGCC ATGTGTAAAA     TTAACCCCAC TATGTGTTAC TTTAAATTGC ACTGATTTGA       CTTGTCTACG TACTCCTATA TTAGTCAAAT ACCCTAGTTT CGAATTTCGG TACACATTTT     AATTGGGGTG ATACACAATG AAATTTAACG TGACTAAACT      101 E  Q  M  H E  D  I   I  S  L   W  D  Q  S L  K  P   C  V  K   L  T     P  L C  V  T   L  N  C   T  D  L      K                                          pvuII        speI nspBII      401 AAAATGCTAC TAATACCACT AGTAGCAGCT GGGGAAAGAT GGAGAGAGGA GAAATAAAAA     ACTGCTCTTT CAATGTCACC ACAAGTATAA GAGATAAGAT       TTTTACGATG ATTATGGTGA TCATCGTCGA CCCCTTTCTA CCTCTCTCCT CTTTATTTTT     TGACGAGAAA GTTACAGTGG TGTTCATATT CTCTATTCTA      135 N  A  T   N  T  T   S  S  S  W G  K  M   E  R  G   E  I  K  N C  S     F   N  V  T   T  S  I  R D  K  M                                                     scfI      501 GAAGAATGAA TATGCACTTT TTTATAAACT TGATGTAGTA CCAATAGATA ATGATAATAC     TAGCTATAGG TTGATAAGTT GTAACACCTC AGTCATTACA       CTTCTTACTT ATACGTGAAA AAATATTTGA ACTACATCAT GGTTATCTAT TACTATTATG     ATCGATATCC AACTATTCAA CATTGTGGAG TCAGTAATGT      168 K  N  E   Y  A  L  F Y  K  L   D  V  V  P  I  D  N D  N  T   S  Y     R   L  I  S  C N  T  S  V  I  T                                                 stuI    bsp1286       haeI    bmyI      601 CAGGCCTGTC CAAAGGTGTC CTTTGAGCCA ATTCCCATAC ATTATTGTGC CCCGGCTGGT     TTTGCGATTC TAAAGTGTAG AGATAAAAAG TTCAACGGAA       GTCCGGACAG GTTTCCACAG GAAACTCGGT TAAGGGTATG TAATAACACG GGGCCGACCA     AAACGCTAAG ATTTCACATC TCTATTTTTC AAGTTGCCTT      201 Q  A  C  P K  V  S  F  E  P   I  P  I  H Y  C  A  P  A  G   F  A  I      L K  C  R   D  K  K   F  N  G      T                                          bsp1407I   bsp1407I haeI          701 CAGGACCATG TACAAATGTC AGCACAGTAC AATGTACACA TGGAATTAGG CCAGTAGTA     T CAACTCAACT GCTGTTAAAT GGCAGTTTAG CAGAAGAAGA       GTCCTGGTAC ATGTTTACAG TCGTGTCATG TTACATGTGT ACCTTAATCC GGTCATCATA     GTTGAGTTGA CGACAATTTA CCGTCAAATC GTCTTCTTCT      235 G  P  C   T  N  V   S  T  V  Q C  T  H  G  I  R   P  V  V  S T  Q     L   L  L  N   G  S  L  A E  E  E                                                bstYI/xhoII    pvuII     bsp1407I        bglII    nspBII scfI aseI/asnI/vspI      801 AGTAGTAATT AGATCTGCCA ATTTCTCGGA CAATGCTAAA ACCATAATAG TACAGCTGAA     CGAATCTGTA GAAATTAATT GTACAAGACC CAACAACAAT       TCATCATTAA TCTAGACGGT TAAAGAGCCT GTTACGATTT TGGTATTATC ATGTCGACTT     GCTTAGACAT CTTTAATTAA CATGTTCTGG GTTGTTGTTA      268 V  V  I   R  S  A  N F  S  D  N  A  K   T  I  I  V Q  L  N   E  S     V   E  I  N  C T  R  P   N  N  N                                                bst1107I        accI      901 ACAAGAAGAA GTATACATAT AGGACCAGGG AGAGCATTTT ATGCAACAGG AGAAATAATA     GGAGACATAA GACAAGCACA TTGTAACCTT AGTAGCACAA       TGTTCTTCTT CATATGTATA TCCTGGTCCC TCTCGTAAAA TACGTTGTCC TCTTTATTAT     CCTCTGTATT CTGTTCGTGT AACATTGGAA TCATCGTGTT      301 T  R  R  S I  H  I  G  P  G   R  A  F  Y A  T  G   E  I  I   G  D     I  R Q  A  H   C  N  L   S  S  T      K                                         ppuMI        eco81I      eco0109I/draII                                                 ahaIII/dr     aI  bsu36I/mstII/sauI      1001 AATGGAATAA TACTTTAAAA CAGATAGTTA CAAAATTAAG AGAACATTTT AATAAAACAA     TAGTCTTTAA TCACTCCTCA GGAGGGGACC CAGAAATTGT       TTACCTTATT ATGAAATTTT GTCTATCAAT GTTTTAATTC TCTTGTAAAA TTATTTTGTT     ATCAGAAATT AGTGAGGAGT CCTCCCCTGG GTCTTTAACA      335 W  N  N  T  L  K  Q  I  V  T K  L  R   E  H  F   N  K  T  I V  F  N       H  S  S   G  G  D  P E  I  V                                                    apoI    scaI   eco57I          1101 AATGCACAGT TTTAATTGTG GAGGGGAATT TTTCTACTGT AATACAACAC     CACTGTTTAA TAGTACTTGG AATTATACTT ATACTTGGAA TAATACTGAA       TTACGTGTCA AAATTAACAC CTCCCCTTAA AAAGATGACA TTATGTTGTG GTGACAAATT     ATCATGAACC TTAATATGAA TATGAACCTT ATTATGACTT      368 M  H  S   F  N  C  G G  E  F   F  Y  C   N  T  T  P L  F  N   S  T     W   N  Y  T  Y T  W  N   N  T  E                                                    nspI            nspHI            aflIII      1201 GGGTCAAATG ACACTGGAAG AAATATCACA CTCCAATGCA GAATAAAACA AATTATAAAC     ATGTGGCAGG AAGTAGGAAA AGCAATGTAT GCCCCTCCCA       CCCAGTTTAC TGTGACCTTC TTTATAGTGT GAGGTTACGT CTTATTTTGT TTAATATTTG     TACACCGTCC TTCATCCTTT TCGTTACATA CGGGGAGGGT      401 G  S  N  D T  G  R  N  I  T   L  Q  C  R I  K  Q   I  I  N   M  W     Q  E V  G  K   A  M  Y   A  P  P      I                                          eco57I      mamI   bstYI/xholI      gsuI/bpmI                                           bsaBI sspI  bglII     ecoNI      1301 TAAGAGGACA AATTAGATGC TCATCAAATA TTACAGGGCT GCTATTAACA AGAGATGGTG     GTAATAACAG CGAAACCGAG ATCTTCAGAC CTGGAGGAGG       ATTCTCCTGT TTAATCTACG AGTAGTTTAT AATGTCCCGA CGATAATTGT TCTCTACCAC     CATTATTGTC GCTTTGGCTC TAGAAGTCTG GACCTCCTCC      435 R  G  Q   I  R  C   S  S  N  I T  G  L   L  L  T   R  D  G  G N  N     S  E  T  E   I  F  R  P G  G  G                                                 munI      styI  earl/ksp632I      1401 AGATATGAGG GACAATTGGA GAAGTGAATT ATATAAATAT AAAGTAGTAA AAATTGAACC     ATTAGGAGTA GCACCCACCA AGGCAAAGAG AAGAGTGATG       TCTATACTCC CTGTTAACCT CTTCACTTAA TATATTTATA TTTCATCATT TTTAACTTGG     TAATCCTCAT CGTGGGTGGT TCCGTTTCTC TTCTCACTAC      468 D  M  R   D  N  W  R S  E  L  Y  K  Y  K  V  V  K I  E  P   L  G  V       A  P  T  K A  K  R  R  V  M                                                     styI      1501 CAGAGAGAAA AAAGAGCAGT GGGAATAGGA GCTGTGTTCC TTGGGTTCTT GGGAGCAGCA     GGAAGCACTA TGGGCGCAGC GTCAGTGACG CTGACGGTAC       GTCTCTCTTT TTTCTCGTCA CCCTTATCCT CGACACAAGG AACCCAAGAA CCCTCGTCGT     CCTTCGTGAT ACCCGCGTCG CAGTCACTGC GACTGCCATG      501 Q  R  E  K R  A  V  G  I  G   A  V  F  L G  F  L   G  A  A   G  S     T  M G  A  A   S  V  T  L  T  V      Q                                         haeI      alwNI      1601 AGGCCAGACT ATTATTGTCT GGTATAGTGC AACAGCAGAA CAATTTGCTG AGGGCTATTG     AGGCCGAACA GCATCTGTTG CAACTCACAG TCTGGGGCAT       TCCGGTCTGA TAATAACAGA CCATATCACG TTGTCGTCTT GTTAAACGAC TCCCGATAAC     TCCGGCTTGT CGTAGACAAC GTTGAGTGTC AGACCCCGTA      535 A  R  L  L  L  S   G  I  V  Q Q  Q  N   N  L  L   R  A  I  E A  E     Q   H  L  L   Q  L  T  V W  G  I                                                gsul/bpmI     alwNI      1701 CAAGCAGCTC CAGGCAAGAG TCCTGGCTGT GGAGAGATAC CTAAAGGATC AACAGCTCCT     GGGGATTTGG GGTTGCTCTG GAAAACTCAT CTGCACCACT       GTTCGTCGAG GTCCGTTCTC AGGACCGACA CCTCTCTATG GATTTCCTAG TTGTCGAGGA     CCCCTAAACC CCAACGAGAC CTTTTGAGTA GACGTGGTGA      568 K  Q  L   Q  A  R  V L  A  V  E  R  Y   L  K  D  Q Q  L  L   G  I     W   G  C  S  G K  L  I   C  T  T                                               styI bsmI        hindIII            1801 GCTGTGCCTT GGAATGCTAG TTGGAGTAAT AAATCTCTGG ATAAGATTTG     GGATAACATG ACCTGGATGG AGTGGGAAAG AGAAATTGAC AATTACACAA       CGACACGGAA CCTTACGATC AACCTCATTA TTTAGAGACC TATTCTAAAC CCTATTGTAC     TGGACCTACC TCACCCTTTC TCTTTAACTG TTAATGTGTT      601 A  V  P  W N  A  S  W  S  N   K  S  L  D K  I  W   D  N  M   T  W     M  E W  E  R  E  I  D   N  Y  T      S                                        1901 GCTTAATATA CAGCTTAATT     GAAGAATCGC AGAACCAACA AGAAAAAAAT GAACAAGAAT TATTGGAATT AGATAAATGG     GCAAGTTTGT GGAATTGGTT       CGAATTATAT GTCGAATTAA CTTCTTAGCG TCTTGGTTGT TCTTTTTTTA CTTGTTCTTA     ATAACCTTAA TCTATTTACC CGTTCAAACA CCTTAACCAA      635 L  I  Y  S  L  I   E  E  S  Q N  Q  Q   E  K  N   E  Q  E  L L  E     L   D  K  W   A  S  L  W N  W  F                                                  sspI     scfI      2001 TGACATAACA AAATGGCTGT GGTATATAAA AATATTCATA ATGATAGTAG GAGGCTTGGT     AGGTTTAAGA ATAGTTTTTA CTGTACTTTC TATAGTGAAT       ACTGTATTGT TTTACCGACA CCATATATTT TTATAAGTAT TACTATCATC CTCCGAACCA     TCCAAATTCT TATCAAAAAT GACATGAAAG ATATCACTTA      668 D  I  T   K  W  L  W Y  I  K   I  F  I   M  I  V  G G  L  V   G  L     R   I  V  F  T V  L  S   I  V  N                                                    avaI      2101 AGAGTTAGGA AGGGATACTC ACCATTATCG TTCCAGACCC ACCTCCCAGC CCCGAGGGGA     CTCGACAGGC CCGAAGGAAC CGAAGAAGAA GGTGGAGAGC       TCTCAATCCT TCCCTATGAG TGGTAATAGC AAGGTCTGGG TGGAGGGTCG GGGCTCCCCT     GAGCTGTCCG GGCTTCCTTG GCTTCTTCTT CCACCTCTCG      701 R  V  R  K G  Y  S   P  L  S   F  Q  T  H L  P  A   P  R  G   L  D     R  P E  G  T   E  E  E   G  G  E      R                                           bspMI          salI      xcmI    hincII/hindlI eco57I      bstYI/xhoII   munI accI earI/ksp632I      2201 GAGACAGAGA CAGATCCAGT CGATTAGTGG ATGGATTCTT AGCAATTGTC TGGGTCGACC     TGCGGAGCCT GTGCCTCTTC AGCTACCACC GCTTGAGAGA       CTCTGTCTCT GTCTAGGTCA GCTAATCACC TACCTAAGAA TCGTTAACAG ACCCAGCTGG     ACGCCTCGGA CACGGAGAAG TCGATGGTGG CGAACTCTCT      735 D  R  D   R  S  S   R  L  V  D G  F  L   A  I  V   W  V  D  L R  S     L   C  L  F   S Y  H  R L  R  D                                                       sspI scfI      2301 CTTACTCTTG ATTGCAGCGA GGATTGTGGA ACTTCTGGGA CGCAGGGGGT GGGAAGCCCT     CAAATATTGG TGGAATCTCC TACAGTATTG GATTCAGGAA       GAATGAGAAC TAACGTCGCT CCTAACACCT TGAAGACCCT GCGTCCCCCA CCCTTCGGGA     GTTTATAACC ACCTTAGAGG ATGTCATAAC CTAAGTCCTT      768 L  L  L   I  A  A  R I  V  E   L  L  G   R  R  G  W E  A  L  K  Y     W   W  N  L  L Q  Y  W   I  Q  E                                                   alwNI      2401 CTAAAGAATA GTGCTGTTAG CTTGCTCAAT GCCACAGCCA TAGCAGTAGC TGAGGGAACA     GATAGGGTTA TAGAAATAGT ACAAAGAGCT TATAGAGCTA       GATTTCTTAT CACGACAATC GAACGAGTTA CGGTGTCGGT ATCGTCATCG ACTCCCTTGT     CTATCCCAAT ATCTTTATCA TGTTTCTCGA ATATCTCGAT      801 L  K  N  S A  V  S   L  L  N   A  T  A  I A  V  A   E  G  T   D  R     V  I E  I  V   Q  R  A   Y  R  A      I                                       2501 TTCTCCACAT ACCCACACGA     ATAAGACAGG GCTTGGAAAG GGCTTTGCTA      TAA                                     AAGAGGTGTA TGGGTGTGCT TATTCTGTCC      CGAACCTTTC CCGAAACGAT      ATT                                              835 L  H  I   P  T  R      I  R  Q  G L  E  R   A  L  L      O

Table 2 illustrates the nucleotide sequence and the predicted amino acidsequence of the GNE₁₆ isolate of HIV. The upper sequence is the codingstrand. The table also illustrates the location of each of therestriction sites. The first four pages of the table are from one cloneof the gene and the second three pages of the table are from anotherclone of the gene. The sequences of the clones differ by about 2%. (Thenucleotide sequences are SEQ. ID. NOs. 29. The amino acid sequences areSEQ. ID. NOs. 32 and 33, respectively.) It is noted that each of thesequences includes a stop codon. A gene sequence that encodes fulllength gp120 can be made by repairing one of the sequences.

    TABLE 2      - hgiCI      banI    scfI      bsp1286    pstI      bmyI styI  scfI bsgI      1 ATGAGAGTGA AGGGGATCAG GAGGAATTAT CAGCACTTGT GGAGATGGGG CACCATGCTC     CTTGGGATAT TGATGATCTG TAGTGCTGCA GGGAAATTGT       TACTCTCACT TCCCCTAGTC CTCCTTAATA GTCGTGAACA CCTCTACCCC GTGGTACGAG     GAACCCTATA ACTACTAGAC ATCACGACGT CCCTTTAACA      1 M  R  V  K G  I  R  R  N  Y   Q  H  L  W R  W  G   T  M  L   L  G  I     L M  I  C   S  A  A   G  K  L      W                                            kpnI        hgiCI        banI        asp718        acc65I      ndeI      101 GGGTCACAGT CTATTATGGG GTACCTGTGT GGAAAGAAAC AACCACCACT CTATTTTGTG     CATCAGATGC TAAAGCATAT GATACAGAGA TACATAATGT       CCCAGTGTCA GATAATACCC CATGGACACA CCTTTCTTTG TTGGTGGTGA GATAAAACAC     GTAGTCTACG ATTTCGTATA CTATGTCTCT ATGTATTACA      35 V  T  V   Y  Y  G   V  P  V  W K  E  T   T  T  T   L  F  C  A S  D     A   K  A  Y   D  T  E  I H  N  V                                                      nspI       nspI       nspHI       nspHI      apoI aflIII      201 TTGGGCCACA CATGCCTGTG TACCCACAGA CCCCAACCCA CAAGAAGTAG TATTGGAAAA     TGTGACAGAA AATTTTAACA TGTGGAAAAA TAACATGGTG       AACCCGGTGT GTACGGACAC ATGGGTGTCT GGGGTTGGGT GTTCTTCATC ATAACCTTTT     ACACTGTCTT TTAAAATTGT ACACCTTTTT ATTGTACCAC      68 W  A  T   H  A  C  V P  T  D   P  N  P   Q  E  V  V L  E  N   V  T     E   N  F  N  M W  K  N   N  M  V                                               ppu10I       nsiI/avaIII    ahaIII/draI  draIII  ahaIII/draI      301 GAACAGATGC ATGAGGATAT AATCAGTTTA TGGGATCAAA GTTTAAAGCC ATGTGTAAAA     TTAACCCCAC TCTGTGTTAC TTTAAATTGC ACTGATGCGG       CTTGTCTACG TACTCCTATA TTAGTCAAAT ACCCTAGTTT CAAATTTCGG TACACATTTT     AATTGGGGTG AGACACAATG AAATTTAACG TGACTACGCC      101 E  Q  M  H E  D  I   I  S  L   W  D  Q  S L  K  P   C  V  K   L  T     P  L C  V  T   L  N  C   T  D  A      G                                           gsul/bpmI      401 GGAATACTAC TAATACCAAT AGTAGTAGCA GGGAAAAGCT GGAGAAAGGA GAAATAAAAA     ACTGCTCTTT CAATATCACC ACAAGCGTGA GAGATAAGAT       CCTTATGATG ATTATGGTTA TCATCATCGT CCCTTTTCGA CCTCTTTCCT CTTTATTTTT     TGACGAGAAA GTTATAGTGG TGTTCGCACT CTCTATTCTA      135 N  T  T   N  T  N   S  S  S  R E  K  L  E  K  G   E  I  K  N C  S     F   N  I  T   T  S  V  R D  K  M      421, reverse                       scaI scaI scfI      501 GCAGAAAGAA ACTGCACTTT TTAATAAACT TGATATAGTA CCAATAGATG ATGATGATAG     GAATAGTACT AGGAATAGTA CTAACTATAG GTTGATAAGT       CGTCTTTCTT TGACGTGAAA AATTATTTGA ACTATATCAT GGTTATCTAC TACTACTATC     CTTATCATGA TCCTTATCAT GATTGATATC CAACTATTCA      168 Q  K  E   T  A  L  F N  K  L   D  I  V   P  I  D  D D  D  R   N  S     T   R  N  S  T N  Y  R   L  I  S      43r2 , reverse                    stuI         haeI      601 TGTAACACCT CAGTCATTAC ACAGGCCTGT CCAAAGGTAT CATTTGAGCC AATTCCCATA     CATTTCTGTA CCCCGGCTGG TTTTGCGCTT CTAAAGTGTA       ACATTGTGGA GTCAGTAATG TGTCCGGACA GGTTTCCATA GTAAACTCGG TTAAGGGTAT     GTAAAGACAT GGGGCCGACC AAAACGCGAA GATTTCACAT      201 C  N  T  S V  I  T   Q  A  C  P  K  V  S F  E  P   I  P  I   H  F     C  T P  A  G   F  A  L   L  K  C      N                                             bsp1407I haeI      701 ATAATAAGAC GTTCAATGGA TCAGGACCAT GCAAAAATGT CAGCACAGTA CAATGTACAC     ATGGAATTAG GCCAGTAGTA TCAACTCAAC TGCTGTTAAA       TATTATTCTG CAAGTTACCT AGTCCTGGTA CGTTTTTACA GTCGTGTCAT GTTACATGTG     TACCTTAATC CGGTCATCAT AGTTGAGTTG ACGACAATTT      235 N  K  T   F  N  G   S  G  P  C K  N  V   S  T  V   Q  C  T  H G  I     R   P  V  V   S  T  Q  L L  L  N                                                  bstYI/xhoII    pvuII     aseI/asnI/          bglII      apoI    nspBII  vspI                                          801     TGGCAGTCTA GCAGAAGGAG AGGTAGTAAT TAGATCTGAA AATTTCACGA ACAATGCTAA     AACCATAATA GTACAGCTGA CAGAACCAGT AAAAATTAAT       ACCGTCAGAT CGTCTTCCTC TCCATCATTA ATCTAGACTT TTAAAGTGCT TGTTACGATT     TTGGTATTAT CATGTCGACT GTCTTGGTCA TTTTTAATTA      268 G  S  L   A  E  G  E V  V  I   R  S  E   N  F  T  N N  A  K   T  I     I   V  Q  L  T E  P  V   K  I  N      f1, forward                 bst1107I       bsp1407I   accI scfI      901 TGTACAAGAC CCAACAACAA TACAAGAAAA AGTATACCTA TAGGACCAGG GAGAGCATTT     TATGCAACAG GAGACATAAT AGGAAATATA AGACAAGCAC       ACATGTTCTG GGTTGTTGTT ATGTTCTTTT TCATATGGAT ATCCTGGTCC CTCTCGTAAA     ATACGTTGTC CTCTGTATTA TCCTTTATAT TCTGTTCGTG      301 C  T  R  P N  N  N   T  R  K   S  I  P  I G  P  G   R  A  F   Y  A     T  G D  I  I   G  N  I   R  Q  A      H         875,reverse                            eco81I                bsu36I/                mstII/                sauI      1001 ATTGTAACCT TAGTAGAACA GACTGGAATA ACACTTTAGG ACAGATAGTT GAAAAATTAA     GAGAACAATT TGGGAATAAA ACAATAATCT TTAATCACTC       TAACATTGGA ATCATCTTGT CTGACCTTAT TGTGAAATCC TGTCTATCAA CTTTTTAATT     CTCTTGTTAA ACCCTTATTT TGTTATTAGA AATTAGTGAG      335 C  N  L   S  R  T   D  W  N  N T  L  G   Q  I  V   E  K  L  R E  Q     F   G  N  K   T  I  I  F N  H  S                                               ppuMI       eco0109I/dralI    apoI   munI scaI      1101 CTCAGGAGGG GACCCAGAAA TTGTAATGCA CAGTTTTAAT TGTAGAGGGG AATTTTTCTA     CTGTAATACA ACACAATTGT TTGACAGTAC TTGGGATAAT       GAGTCCTCCC CTGGGTCTTT AACATTACGT GTCAAAATTA ACATCTCCCC TTAAAAAGAT     GACATTATGT TGTGTTAACA AACTGTCATG AACCCTATTA      368 S  G  G   D  P  E  I V  M  H   S  F  N   C  R  G  E F  F  Y   C  N     T   T  Q  L  F D  S  T   W  D  N                                                      nspI         earI/ksp632I     nspHI         eco57I     aflIII      1201 ACTAAAGTGT CAAATGGCAC TAGCACTGAA GAGAATAGCA CAATCACACT CCCATGCAGA     ATAAAGCAAA TTGTAAACAT GTGGCAGGAA GTAGGAAAAG       TGATTTCACA GTTTACCGTG ATCGTGACTT CTCTTATCGT GTTAGTGTGA GGGTACGTCT     TATTTCGTTT AACATTTGTA CACCGTCCTT CATCCTTTTC      401 T  K  V  S N  G  T   S  T  E   E  N  S  T I  T  L   P  C  R   I  K     Q  I V  N  M   W  Q  E   V  G  K      A                                           mamI          bsaBI sspI bsaI      1301 CAATGTATGC CCCTCCCATC AGAGGACAAA TTAGATGTTC ATCAAATATT ACAGGGTTGC     TATTAACAAG AGATGGAGGT AGTAACAACA GCATGAATGA       GTTACATACG GGGAGGGTAG TCTCCTGTTT AATCTACAAG TAGTTTATAA TGTCCCAACG     ATAATTGTTC TCTACCTCCA TCATTGTTGT CGTACTTACT      435 M  Y  A  P  P  I   R  G  Q  I R  C  S   S  N  I   T  G  L  L L  T     R   D  G  G   S  N  N  S M  N  E      2, 16.7f3, forward               gsuI/bpmI       eco57I ecoNI  munI      styI      1401 GACCTTCAGA CCTGGAGGAG GAGATATGAG GGACAATTGG AGAAGTGAAT TATACAAATA     TAAAGTAGTA AAAATTGAAC CATTAGGAGT AGCACCCACC       CTGGAAGTCT GGACCTCCTC CTCTATACTC CCTGTTAACC TCTTCACTTA ATATGTTTAT     ATTTCATCAT TTTTAACTTG GTAATCCTCA TCGTGGGTGG      468 T  F  R   P  G  G  G D  M  R   D  N  W   R  S  E  L Y  K  Y   K  V     V   K  I E  P L  G  V   A  P  T      c4rev4,reverse                  earI/ksp632I    styI      1501 AAGGCAAAGA GAAGAGTGGT GCAGAGAGAA AAAAGAGCAG TGGGAATAGG AGCTGTGTTC     CTTGGGTTCT TAGGAGCAGC AGGAAGCACT ATGGGCGCAG       TTCCGTTTCT CTTCTCACCA CGTCTCTCTT TTTTCTCGTC ACCCTTATCC TCGACACAAG     GAACCCAAGA ATCCTCGTCG TCCTTCGTGA TACCCGCGTC      501 K  A  K  R R  V  V   Q  R  E   K  R  A  V G  I  G  A  V  F   L  G     F  L G  A  A   G  S  T   M  G  A      A                                          haeI      alwNI      1601 CGTCAATAAC GCTGACGGTA CAGGCCAGAC TATTATTGTC TGGTATAGTG CAACAGCAGA     ACAATTTGCT GAGGGCTATT GAGGCGCAAC AGCATCTGTT       GCAGTTATTG CGACTGCCAT GTCCGGTCTG ATAATAACAG ACCATATCAC GTTGTCGTCT     TGTTAAACGA CTCCCGATAA CTCCGCGTTG TCGTAGACAA      535 S  I  T   L  T  V   Q  A  R  L L  L  S   G  I  V   Q  Q  Q  N N  L     L   R  A  I   E  A  Q  Q H  L  L                                                   43f5,forward      43r3,reverse             eco81I      alwNI                                                        gsuI/bpmI       bsu36I/mstII/sauI      1701 GCAACTCATA GTCTGGGGCA TCAAGCAGCT CCAGGCAAGA GTCCTGGCTG TGGAAAGATA     CCTAAGGGAT CAACAGCTCC TGGGGATTTG GGGTTGCTCT       CGTTGAGTAT CAGACCCCGT AGTTCGTCGA GGTCCGTTCT CAGGACCGAC ACCTTTCTAT     GGATTCCCTA GTTGTCGAGG ACCCCTAAAC CCCAACGAGA      568 Q  L  I   V  W  G  I K  Q  L   Q  A  R   V  L  A  V E  R  Y   L  R     D   Q  Q  L  L G  I  W   G  C  S                                                 styI bsmI  xbaI      1801 GGAAAACTCA TTTGCACCAC CTCAGTGCCT TGGAATGCTA GTTGGAGTAA TAAATCTCTA     GATAAGATTT GGGATAACAT GACCTGGATG GAGTGGGAAA       CCTTTTGAGT AAACGTGGTG GAGTCACGGA ACCTTACGAT CAACCTCATT ATTTAGAGAT     CTATTCTAAA CCCTATTGTA CTGGACCTAC CTCACCCTTT      601 G  K  L  I C  T  T   S  V  P   W  N  A  S W  S  N   K  S  L  D  K     I  W D  N  M   T  W  M   E  W  E      R                                         hindIII      1901 GAGAAATTGA GAATTACACA AGCTTAATAT ACACCTTAAT TGAAGAATCG CAGAACCAAC     AAGAAAAGAA TGAACAAGAC TTATTGGAAT TGGATCAATG       CTCTTTAACT CTTAATGTGT TCGAATTATA TGTGGAATTA ACTTCTTAGC GTCTTGGTTG     TTCTTTTCTT ACTTGTTCTG AATAACCTTA ACCTAGTTAC      635 E  I  E   N  Y  T   S  L  I  Y T  L  I   E  E  S   Q  N  Q  Q E  K     N   E  Q  D   L  L  E  L D  Q  W                                                    sspI      2001 GGCAAGTCTG TGGAATTGGT TTAGCATAAC AAAATGGCTG TGGTATATAA AAATATTCAT     AATGATAGTT GGAGGCTTGG TAGGTTTAAG AATAGTTTTT       CCGTTCAGAC ACCTTAACCA AATCGTATTG TTTTACCGAC ACCATATATT TTTATAAGTA     TTACTATCAA CCTCCGAACC ATCCAAATTC TTATCAAAAA      668 A  S  L   W  N  W  F S  I  T   K  W  L   W  Y  I  K I  F  I   M  I     V   G  G  L  V G  L  R   I  V  F                                                 43f6, forward  2000,reverse         scfI      avaI      bsaI                                                  2101 GCTGTACTTT     CTATAGTGAA TAGAGTTAGG CAGGGATACT CACCATTATC GTTTCAGACC CGCCTCCCAG     CCCCGAGGAG ACCCGACAGG CCCGAAGGAA       CGACATGAAA GATATCACTT ATCTCAATCC GTCCCTATGA GTGGTAATAG CAAAGTCTGG     GCGGAGGGTC GGGGCTCCTC TGGGCTGTCC GGGCTTCCTT      701 A  V  L  S I  V  N   R  V  R   Q  G  Y  S P  L  S  F  Q  T   R  L     P  A P  R  R   P  D  R   P  E  G      I                                           xcmI      eco57I          bstYI/xholI      earI/ksp632I      2201 TCGAAGAAGA AGGTGGAGAG CAAGGCAGAG ACAGATCCAT TCGCTTAGTG GATGGATTCT     TAGCACTTAT CTGGGACGAC CTACGGAGCC TGTGCCTCTT       AGCTTCTTCT TCCACCTCTC GTTCCGTCTC TGTCTAGGTA AGCGAATCAC CTACCTAAGA     ATCGTGAATA GACCCTGCTG GATGCCTCGG ACACGGAGAA      735 E  E  E   G  G  E   Q  G  R  D R  S  I   R  L  V   D  G  F  L A  L     I   W  D  D   L  R  S  L C  L  F      r1,reverse                            sspI      2301 CAGCTACCAC CGCTTGAGAG ACTTACTCTT GATTGCAACG AGGATTGTGG AACTTCTGGG     ACGCAGGGGG TGGGAAGCCC TCAAATATTG GTGGAATCTC       GTCGATGGTG GCGAACTCTC TGAATGAGAA CTAACGTTGC TCCTAACACC TTGAAGACCC     TGCGTCCCCC ACCCTTCGGG AGTTTATAAC CACCTTAGAG      768 S  Y  H   R  L  R  D L  L  L   I  A  T   R  I  V  E L  L  G   R  R     G   W  E  A  L K  Y  W   W  N  L                                               scfI      alwNI      2401 CTACAGTATT GGATTCAGGA ACTAAAGAAT AGTGCTGTTA GCTTGCTTAA TGTCACAGCC     ATAGCAGTAG CTGAGGGGAC AGATAGGGTT TTAGAAGTAT       GATGTCATAA CCTAAGTCCT TGATTTCTTA TCACGACAAT CGAACGAATT ACAGTGTCGG     TATCGTCATC GACTCCCCTG TCTATCCCAA AATCTTCATA      801 L  Q  Y  W I  Q +E  L  K  N   S  A  V  S L  L  N  V  T  A   I  A  V      A E  G  T  D  R  V   P  E  V      L                                          2501 TACAAAGAGC TTATAGAGCT     ATTCTCCACA TACCTACAAG AATAAGACAG GGCTTGGAAA GGGCTTTGCT      ATAA              ATGTTTCTCG AATATCTCGA TAAGAGGTGT ATGGATGTTC TTATTCTGTC      CCGAACCTTT CCCGAAACGA      TATT                                             835 Q  R  A  Y  R  A     I  L  H  I P  T  R  I  R  Q   G  L  E  R A  L  L      O                         hgiCI           banI    scfI           bsp1286    pstI       earl/ksp632I    bmyI styI  scfI bsgI      1 ATGAGAGTGA AGAGGATCAG GAGGAATTAT CAGCACTTGT GGAAATGGGG CACCATGCTC     CTTGGGATGT TGATGATCTG TAGTGCTGCA GGAAAATTGT       TACTCTCACT TCTCCTAGTC CTCCTTAATA GTCGTGAACA CCTTTACCCC GTGGTACGAG     GAACCCTACA ACTACTAGAC ATCACGACGT CCTTTTAACA      1 M  R  V  K R  I  R   R  N  Y   Q  H  L  W K  W  G   T  M  L   L  G  M      L M  I  C   S  A  A   G  K  L      W                                           kpn I        hgiCI        banI        asp718        acc65I      ndeI      101 GGGTCACAGT CTATTATGGG GTACCTGTGT GGAAAGAAAC AACCACCACT CTATTTTGTG     CATCAGATGC TAAAGCATAT GATACAGAGA TACATAATGT       CCCAGTGTCA GATAATACCC CATGGACACA CCTTTCTTTG TTGGTGGTGA GATAAAACAC     GTAGTCTACG ATTTCGTATA CTATGTCTCT ATGTATTACA      35 V  T  V   Y  Y  G   V  P  V  W K  E  T   T  T  T   L  F  C  A S  D     A   K  A  Y   D  T  E  I H  N  V                                                      nspI       nspI       nspHI       nspHI      apoI aflIII      201 TTGGGCCACA CATGCCTGTG TACCCACAGA CCCCAACCCA CAAGAAGTAG TATTGGAAAA     TGTGACAGAA AATTTTAACA TGTGGAAAAA TAACATGGTG       AACCCGGTGT GTACGGACAC ATGGGTGTCT GGGGTTGGGT GTTCTTCATC ATAACCTTTT     ACACTGTCTT TTAAAATTGT ACACCTTTTT ATTGTACCAC      68 W  A  T   H  A  C  V P  T  D   P  N  P   Q  E  V `V L  E  N   V  T     E  A  N  F  N  M W  K  N   N  M  V                                             ppu10I       nsil/avaIII      draIII  ahaIII/draI      301 GAACAGATGC ATGAGGATAT AATCAGTTTA TGGGATCAAA GTCTAAAGCC ATGTGTAAAA     TTAACCCCAC TCTGTGTTAC TTTAAATTGC ACTGATGCGG       CTTGTCTACG TACTCCTATA TTAGTCAAAT ACCCTAGTTT CAGATTTCGG TACACATTTT     AATTGGGGTG AGACACAATG AAATTTAACG TGACTACGCC      101 E  Q  M  H E  D  I   I  S  L   W  D  Q  S L  K  P   C  V  K   L  T     P  L C  V  T   L  N  C   T  D  A      G                                           gsul/bpmI      401 GGAATACTAC TAATACCAAT AGTAGTAGCG GGGAAAAGCT GGAGAAAGGA GAAATAAAAA     ACTGCTCTTT CAATATCACC ACAAGCATGA GAGATAAGAT       CCTTATGATG ATTATGGTTA TCATCATCGC CCCTTTTCGA CCTCTTTCCT CTTTATTTTT     TGACGAGAAA GTTATAGTGG TGTTCGTACT CTCTATTCTA      135 N  T  T   N  T  N   S  S  S  G E  K  L   E  K  G   E  I  K  N C  S     F   N  I  T   T  S  M  R D  K  M                                                     scaI scaI scfI      501 GCAGAGAGAA ACTGCACTTT TTAATAAACT TGATATAGTA CCAATAGATG ATGATGATAG     GAATAGTACT AGGAATAGTA CTAACTATAG GTTGATAAGT       CGTCTCTCTT TGACGTGAAA AATTATTTGA ACTATATCAT GGTTATCTAC TACTACTATC     CTTATCATGA TCCTTATCAT GATTGATATC CAACTATTCA      168 Q  R  E   T  A  L  F N  K  L   D  I  V   P  I  D  D D  D  R   N  S     T   R  N  S  T N  Y  R   L  I  S                                                 stuI         haeI      601 TGTAACACCT CAGTCATTAC ACAGGCCTGT CCAAAGGTAT CATTTGAGCC AATTCCCATA     CATTTCTGTA CCCCGGCTGG TTTTGCGCTT CTAAAGTGTA       ACATTGTGGA GTCAGTAATG TGTCCGGACA GGTTTCCATA GTAAACTCGG TTAAGGGTAT     GTAAAGACAT GGGGCCGACC AAAACGCGAA GATTTCACAT      201 C  N  T  S V  I  T   Q  A  C   P  K  V  S F  E  P   I  P  I   H  F     C  T P  A  G   F  A  L   L  K  C      N                                        esp3I    scaI bsp1407I haeI         701 ATAATGAGAC GTTCAATGGA TCAGGACCAT GCAAAAATGT CAGCACAGTA CTATGTACAC      ATGGAATTAG GCCAGTAGTA TCAACTCAAC TGCTGTTAAA       TATTACTCTG CAAGTTACCT AGTCCTGGTA CGTTTTTACA GTCGTGTCAT GATACATGTG     TACCTTAATC CGGTCATCAT AGTTGAGTTG ACGACAATTT      235 N  E  T   F  N  G   S  G  P  C K  N  V   S  T  V   L  C  T  H G  I     R 11  P  V  V   S  T  Q  L L  L  N                                                bstYI/xhoII      aseI/asnI/          earI/ksp632I  bglII      apoI      vspI                                  801 TGGCAGTCTA GCAGGAGAA     G AGGTAGTAAT TAGATCTGAA AATTTCACGA ACAATGCTAA AACCATAATA GTACAGCTCA     AAGAACCAGT AAAAATTAAT       ACCGTCAGAT CGTCCTCTTC TCCATCATTA ATCTAGACTT TTAAAGTGCT TGTTACGATT     TTGGTATTAT CATGTCGAGT TTCTTGGTCA TTTTTAATTA      268 G  S  L   A  G  E  E V  V  I   R  S  E   N  F  T  N N  A  K   T  I     I   V  Q  L  K E  P  V   K  I  N                                                  bst1107I       bsp1407I   accI      scfI                                                   901 TGTACAAGAC     CCAACAACAA TACAAGAAAA AGTATACCTA TAGGACCAGG GAGAGCATTT TATGCAACAG     GCGACATAAT AGGAAATATA AGACAAGCAC       ACATGTTCTG GGTTGTTGTT ATGTTCTTTT TCATATGGAT ATCCTGGTCC CTCTCGTAAA     ATACGTTGTC CGCTGTATTA TCCTTTATAT TCTGTTCGTG      301 C  T  R  P N  N  N   T  R  K   S  I  P  I G  P  G   R  A  F   Y  A     T  G D  I  I   G  N  I   R  Q  A      H                                                 eco81I                bsu36I/                mstII/                sauI      1001 ATTGTAACCT TAGTAGAACA GACTGGAATA ACACTTTAAG ACAGATAGCT GAAAAATTAA     GAAAACAATT TGGGAATAAA ACAATAATCT TTAATCACTC       TAACATTGGA ATCATCTTGT CTGACCTTAT TGTGAAATTC TGTCTATCGA CTTTTTAATT     CTTTTGTTAA ACCCTTATTT TGTTATTAGA AATTAGTGAG      335 C  N  L   S  R  T   D  W  N  N T  L  R   Q  I  A   E  K  L  R K  Q     F   G  N  K   T  I  I  F N  H  S                                               ppuMI       eco0109I/dralI    apoI   munI scaI bsmI      1101 CTCAGGAGGG GACCCAGAAA TTGTAATGCA CAGTTTTAAT TGTAGAGGGG AATTTTTCTA     CTGTGATACA ACACAATTGT TTAACAGTAC TTGGAATGCA       GAGTCCTCCC CTGGGTCTTT AACATTACGT GTCAAAATTA ACATCTCCCC TTAAAAAGAT     GACACTATGT TGTGTTAACA AATTGTCATG AACCTTACGT      368 S  G  G   D  P  E  I V  M  H   S  F  N   C  R  G  E F  F  Y   C  D     T   T  Q  L  F N  S  T   W  N  A                                                      nspI              nspHI              aflIII      1201 AATAACACTG AAAGGAATAG CACTAAAGAG AATAGCACAA TCACACTCCC ATGCAGAATA     AAACAAATTG TAAACATGTG GCAGGAAGTA GGAAAAGCAA       TTATTGTGAC TTTCCTTATC GTGATTTCTC TTATCGTGTT AGTGTGAGGG TACGTCTTAT     TTTGTTTAAC ATTTGTACAC CGTCCTTCAT CCTTTTCGTT      401 N  N  T  E R  N  S   T  K  E   N  S  T  I T  L  P   C  R  I   K  Q     I  V N  M  W   Q  E  V   G  K  A      M                                           mamI          bsaBI sspI     bsaI      1301 TGTATGCCCC TCCCATCAGA GGACAAATTA GATGTTCATC AAATATTACA GGGTTGCTAT     TAACAAGAGA TGGAGGTAGT AGCAACAGCA TGAATGAGAC       ACATACGGGG AGGGTAGTCT CCTGTTTAAT CTACAAGTAG TTTATAATGT CCCAACGATA     ATTGTTCTCT ACCTCCATCA TCGTTCTCGT ACTTACTCTG      435 Y  A  P   P  I  R   G  Q  I  R C  S  S   N  I  T   G  L  L  L T  R     D   G  G  S   S  N  S  M N  E  T                                                gsuI/bpmI       ecoS7I ecoNI   munI      styI      1401 CTTCAGACCT GGAGGAGGAG ATATGAGGGA CAATTGGAGA AGTGAATTAT ACAAATATAA     AGTAGTAAAA ATTGAACCAT TAGGAGTAGC ACCCACCAAG       GAAGTCTGGA CCTCCTCCTC TATACTCCCT GTTAACCTCT TCACTTAATA TGTTTATATT     TCATCATTTT TAACTTGGTA ATCCTCATCG TGGGTGGTTC      468 F  R  P   G  G  G  D M  R  D   N  W  R   S  E  L  Y K  Y  K  A  V     V  K   I  E  P  L G  V  A   P  T  K                                            earI/ksp632I     styI      1501 GCAATGAGAA GAGTGGTGCA GAGAGAAAAA AGAGCAGTGG GAATAGGAGC TGTGTTCCTT     GGGTTCTTAG GAGCAGCAGG AAGCACTATG GGCGCAGCGT       CGTTACTCTT CTCACCACGT CTCTCTTTTT TCTCGTCACC CTTATCCTCG ACACAAGGAA     CCCAAGAATC CTCGTCGTCC TTCGTGATAC CCGCGTCGCA      501 A  M  R  R V  V  Q   R  E  K   R  A  V  G I  G  A   V  F  L   G  F     L  G A  A  G   S  T  M  G  A  A      S                                          haeI       alwNI      1601 CAATAACGCT GACGGTACAG GCCAGACTAT TATTGTCTGG TATAGTGCAA CAGCAGAACA     ATTTGCTGAG GGCTATTGAG GCGCAACAGC ATCTGTTGCA       GTTATTGCGA CTGCCATGTC CGGTCTGATA ATAACAGACC ATATCACGTT GTCGTCTTGT     TAAACGACTC CCGATAACTC CGCGTTGTCG TAGACAACGT      535 I  T  L   T  V  Q   A  R  L  L L  S  G   I  V  Q   Q  Q  N  N L  L     `R  A  I  E   A  Q  Q  H L  L  Q                                                    eco81I  alwNI         gsuI/bpmI   bsu36I/mstII/sauI      1701 ACTCACAGTC TGGGGCATCA AGCAGCTCCA GGCAAGAGTC CTGGCTGTGG AAAGATACCT     AAGGGATCAA CAGCTCCTGG GGATTTGGGG TTGCTCTGGA       TGAGTGTCAG ACCCCGTAGT TCGTCGAGGT CCGTTCTCAG GACCGACACC TTTCTATGGA     TTCCCTAGTT GTCGAGGACC CCTAAACCCC AACGAGACCT      568 L  T  V   W  G  I  K Q  L  Q   A  R  V   L  A  V  E R  Y  L   R  D     Q   Q  L  L  G I  W  G   C  S  G                                                 styI bsmI   xbaI      1801 AAACTCATTT GCACCACCTC TGTGCCTTGG AATGCTAGTT GGAGTAATAA ATCTCTAGAT     AAGATTTGGG ATAACATGAC CTGGATGGAG TGGGAAAGAG       TTTGAGTAAA CGTGGTGGAG ACACGGAACC TTACGATCAA CCTCATTATT TAGAGATCTA     TTCTAAACCC TATTGTACTG GACCTACCTC ACCCTTTCTC      601 K  L  I  C T  T  S   V  P  W  N  A  S  W S  N  K  S  L  D   K  I  W      D N  M  T   W  M  E   W  E  R      E                                           hindllI      1901 AAATTGAGAA TTACACAAGC TTAATATACA CCTTAATTGA AGAATCGCAG AACCAACAAG     AAAAGAATAA ACAAGACTTA TTGGAATTGG ATCAATAGGC       TTTAACTCTT AATGTGTTCG AATTATATGT GGAATTAACT TCTTAGCGTC TTGGTTGTTC     TTTTCTTATT TGTTCTGAAT AACCTTAACC TAGTTATCCG      635 I  E  N   Y  T  S   L  I  Y  T L  I  E   E  S  Q   N  Q  Q  E K  N     K   Q  D  L   L  E  L  D Q  O  A                                                   sspI      2001 AAGTTTGTGG AATTGGTTTA GCATAACAAA ATGGCTGTGG TATATAAAAA TATTCATAAT     GATAGTTGGA GGCTTGGTAG GTTTAAGAAT AGTTTTTGCT       TTCAAACACC TTAACCAAAT CGTATTGTTT TACCGACACC ATATATTTTT ATAAGTATTA     CTATCAACCT CCGAACCATC CAAATTCTTA TCAAAAACGA      668 S  L  W   N  W  F  S I  T  K   W  L  W   Y  I  K  I F  I  M   I  V     G   G  L  V  G L  R  I   V  F  A                                                      ppuMI       scfI      avaI eco0109I/draII      2101 GTACTTTCTA TAGTGAATAG AGTTAGGCAG GGGTACTCAC CATTATCATT TCAGACCCGC     CTCCCAGCCC CGAGGGGACC CGACAGGCCC AAAGGAATCG       CATGAAAGAT ATCACTTATC TCAATCCGTC CCCATGAGTG GTAATAGTAA AGTCTGGGCG     GAGGGTCGGG GCTCCCCTGG GCTGTCCGGG TTTCCTTAGC      701 V  L  S  I V  N  R   V  R  Q   G  Y  S  P L  S  F   Q  T  R   L  P     A  P R  G  P   D  R  P   K  G  I      E                                          xcmI       eco57I         bstYI/xhoII       earI/ksp632I      2201 AAGAAGAAGG TGGAGAGCAA GACAGGGACA GATCCATTCG CTTAGTGGAT GGATTCTTAG     CACTTATCTG GGACGATCTA CGGAGCCTGT GCCTCTTCAG       TTCTTCTTCC ACCTCTCGTT CTGTCCCTGT CTAGGTAAGC GAATCACCTA CCTAAGAATC     GTGAATAGAC CCTGCTAGAT GCCTCGGACA CGGAGAAGTC      735 E  E  G   G  E  Q   D  R  D  R S  I  R   L  V  D   G  F  L  A L  I     W   D  D  L   R  S  L  C L  F  S                                                       sspI scfI      2301 CTACCACCGC TTGAGAGACT TACTCTTGAT TGCAACGAGG ATTGTGGAAC TTCTGGGACG     CAGGGGGTGG GAAGCCCTCA AATATTGGTG GAATCTCCTA       GATGGTGGCG AACTCTCTGA ATGAGAACTA ACGTTGCTCC TAACACCTTG AAGACCCTGC     GTCCCCCACC CTTCGGGAGT TTATAACCAC CTTAGAGGAT      768 Y  H  R   L  R  D  L L  L  I   A  T  R   I  V  E  L L  G  R   R  G     W   E  A  L  K Y  W  W   N  L  L                                                     alwNI  xbaI      2401 CAGTATTGGA TTCAGGAACT AAAGAATAGT GCTGTTAGCT TGCTTAATGT CACAGCCATA     GCAGTAGCTG AGGGGACAGA TAGGGTTCTA GAAGCATTGC       GTCATAACCT AAGTCCTTGA TTTCTTATCA CGACAATCGA ACGAATTACA GTGTCGGTAT     CGTCATCGAC TCCCCTGTCT ATCCCAAGAT CTTCGTAACG      801 Q  Y  W  I Q  E  L   K  N  S   A  V  S  L L  N  V   T  A  I   A  V     A  E G  T  D   R  V  L   E  A  L      Q                                       2501 AAAGAGCTTA TAGAGCTATT     CTCCACATAC CTACAAGAAT AAGACAAGGC TTGGAAAGGG CTTTGCTATA A       TTTCTCGAAT ATCTCGATAA GAGGTGTATG GATGTTCTTA TTCTGTTCCG AACCTTTCCC     GAAACGATAT T      835 R  A  Y   R  A  I   L  H  I  P T  R  I   R  Q  G   L  E  R  A L  L     O

Table 3 illustrates the amino acid sequences for the GNE₈ and differentGNE₁₆ gp120 proteins. The regions of the sequences having identicalamino acid sequences are enclosed in boxes. Note: the "X" in position666 of sequence gp160. SF16.7 is a stop codon.

    TABLE 3      -      ##STR1##      ##STR2##      ##STR3##      ##STR4##      ##STR5##      ##STR6##      ##STR7##      ##STR8##      ##STR9##      ##STR10##      ##STR11##      ##STR12##      ##STR13##      ##STR14##      ##STR15##      ##STR16##      ##STR17##      ##STR18##

Nucleic acid sequences encoding gp120 from GNE₈ and GNE₁₆ capable ofexpressing gp120 can be prepared by conventional means. The nucleotidesequence can be synthesized. Alternatively, another HIV nucleic acidsequence encoding gp120 can be used as a backbone and altered at anydiffering residues by site directed mutagenesis as described in detailin Example 1.

In a preferred embodiment, the nucleotide sequence is present in anexpression construct containing DNA encoding gp120 under thetranscriptional and translational control of a promoter for expressionof the encoded protein. The promoter can be a eukaryotic promoter forexpression in a mammalian cell. In cases where one wishes to expand thepromoter or produce gp120 in a prokaryotic host, the promoter can be aprokaryotic promoter. Usually a strong promoter is employed to providehigh level transcription and expression.

The expression construct can be part of a vector capable of stableextrachromosomal maintenance in an appropriate cellular host or may beintegrated into host genomes. Normally, markers are provided with theexpression construct which allow for selection of a host containing theconstruct. The marker can be on the same or a different DNA molecule,desirably, the same DNA molecule.

The expression construct can be joined to a replication systemrecognized by the intended host cell. Various replication systemsinclude viral replication systems such as retroviruses, simian virus,bovine papilloma virus, or the like. In addition, the construct may bejoined to an amplifiable gene, e.g. DHFR gene, so that multiple copiesof the gp120 DNA can be made. Introduction of the construct into thehost will vary depending on the construct and can be achieved by anyconvenient means. A wide variety of prokaryotic and eukaryotic hosts canbe employed for expression of the proteins.

Preferably, the gp120 is expressed in mammalian cells that provide thesame glycosylation and disulfide bonds as in native gp120. Expression ofgp120 and fragments of gp120 in mammalian cells as fusion proteinsincorporating N-terminal sequences of Herpes Simplex Virus Type 1(HSV-1) glycoprotein D (gD-1) is described in Lasky, L. A. et al., 1986(Neutralization of the AIDS retrovirus by antibodies to a recombinantenvelope glycoprotein) Science 233: 209-212 and Haffar, O. K. et al.,1991 (The cytoplasmic tail of HIV-1 gp160 contains regions thatassociate with cellular membranes.) Virol. 180:439-441, respectively. Apreferred method for expressing gp120 is described in Example 3. In theexample, a heterologous signal sequence was used for convenientexpression of the protein. However, the protein can also be expressedusing the native signal sequence.

An isolated, purified GNE₈ -gp120 and GNE₁₆ -gp120 having the amino acidsequence illustrated in Tables 1-3 can be produced by conventionalmethods. For example, the proteins can be chemically synthesized. In apreferred embodiment, the proteins are expressed in mammalian cellsusing an expression construct of this invention. The expressed proteinscan be purified by conventional means. A preferred purificationprocedure is described in Example 3.

gp120 Fragments

The present invention also provides gp120 fragments that are suitablefor use in inducing antibodies for use in serotyping or in a vaccineformulation. A truncated gp120 sequence as used herein is a fragment ofgp120 that is free from a portion of the intact gp120 sequence beginningat either the amino or carboxy terminus of gp120. A truncated gp120sequence of this invention is free from the C5 domain. The C5 domain ofgp120 is a major immunogenic site of the molecule. However, antibodiesto the region do not neutralize virus. Therefore, elimination of thisportion of gp120 from immunogens used to induce antibodies forserotyping is advantageous.

In another embodiment, the truncated gp120 sequence is additionally freefrom the carboxy terminus region through about amino acid residue 453 ofthe gp120 V5 domain. The portion of the V5 domain remaining in thesequence provides a convenient restriction site for preparation ofexpression constructs. However, a truncated gp120 sequence that is freefrom the entire gp120 V5 domain is also suitable for use in inducingantibodies.

In addition, portions of the amino terminus of gp120 can also beeliminated from the truncated gp120 sequence. The truncated gp120sequence can additionally be free from the gp120 signal sequence. Thetruncated gp120 sequence can be free from the amino terminus throughamino acid residue 111 of the gp120 C1 domain, eliminating most of theC1 domain but preserving a convenient restriction site. However, theportion of the C1 domain through the cysteine residue that forms adisulfide bond can additionally be removed, so that the truncated gp120sequence is free from the amino terminus through amino acid residue 117of the gp120 C1 domain. Alternatively, the truncated gp120 sequence canbe free from the amino terminus of gp120 through residue 111 of the C1domain, preserving the V2 disulfide bond. In a preferred embodiment, thetruncated gp120 sequence is free from the amino terminus of gp120through residue 111 of the C1 domain and residue 453 through the carboxyterminus of gp120.

The truncated gp120 sequences can be produced by recombinantengineering, as described previously. Conveniently, DNA encoding thetruncated gp120 sequence is joined to a heterologous DNA sequenceencoding a signal sequence.

Serotyping Method

The present invention also provides an improved serotyping method forHIV strains. The method comprises determining the serotypes of the V2,V3, and C4 domains of gp120.

HIV isolates can be serotyped by conventional immunoassay methodsemploying antibodies to the neutralizing epitopes in the V2, V3, and C4domains for various strains of HIV. Preparation of the antibodies isdescribed hereinbefore. The antibody affinity required for serotypingHIV using a particular immunoassay method does not differ from thatrequired to detect other polypeptide analytes. The antibody compositioncan be polyclonal or monoclonal, preferably monoclonal.

A number of different types of immunoassays are well known using avariety of protocols and labels. The assay conditions and reagents maybe any of a variety found in the prior art. The assay may beheterogeneous or homogeneous. Conveniently, an HIV isolate is adsorbedto a solid phase and detected with antibody specific for one strain ofneutralizing epitope for each neutralizing epitope in the V2, V3, and C4domain. Alternatively, supernatant or lysate from the cultured isolatewhich contains gp120 can be adsorbed to the solid phase. The virus orgp120 can be adsorbed by many well known non-specific binding methods.Alternatively, an anti-gp120 antibody, preferably directed to thecarboxy terminus of gp120 can be used to affix gp120 to the solid phase.A gp120 capture antibody and sandwich ELISA assay for gp120 neutralizingepitopes is described by Moore, AIDS Res. Hum. Retroviruses 9:209-219(1993). Binding between the antibodies and sample can be determined in anumber of ways. Complex formation can be determined by use of solubleantibodies specific for the anti-gp120 antibody. The soluble antibodiescan be labeled directly or can be detected using labeled secondantibodies specific for the species of the soluble antibodies. Variouslabels include radionucleides, enzymes, fluorescers, colloidal metals orthe like. Conveniently, the anti-gp120 antibodies will be labeleddirectly, conveniently with an enzyme.

Alternatively, other methods for determining the neutralizing epitopescan be used. For example, fluorescent-labeled antibodies for aneutralizing epitope can be combined with cells infected by the strainof HIV to be serotyped and analyzed by fluorescence activated cellsorting.

The serotype of the HIV isolate includes the strain of the neutralizingepitopes for the V2, V3, and C4 domains.

It is understood that the application of the teachings of the presentinvention to a specific problem or situation will be within thecapabilities of one having ordinary skill in the art in light of theteachings contained herein. Examples of the products of the presentinvention and representative processes for their isolation, use, andmanufacture appear below, but should not be construed to limit theinvention. All literature citations herein are expressly incorporated byreference.

EXAMPLE 1 Identification of C4 Neutralizing Epitopes

The following reagents and methods were used in the studies describedherein.

gp120 sequences and nomenclature

Amino acid residues are designated using the standard single lettercode. The location of amino acids within the gp120 protein is specifiedusing the initiator methionine residue as position 1. The designationLAI is used to describe the virus isolate from which the HIV-1_(BH10),HIV-1_(MB), HIV-1_(BRU), HIV-1_(HXB2), HIV-1_(HXB3) and HIV-1_(HXB10)substrains (molecular clones) of HIV-1 were obtained. The sequence ofgp120 from IIIB substrain of HIV-1_(LAI) is that determined by Muesinget al. (30).

The sequence of gp120 from MN strain of HIV-1 is given with reference tothe MNgp120 clone (MN_(GNE)). The sequence of this clone differs byapproximately 2% from that of the MN₁₉₈₄ clone described by Gurgo et al.(13). The sequences of gp120 from the NY-5, JRcsf, Z6, Z321, and HXB2strains of HIV-1 are those listed by Myers et al. (32) except wherenoted otherwise. The sequence of the Thai isolate A244 is that providedby McCutchan et al. (24). The variable (V) domains and conserved (C)domains of gp120 are specified according to the nomenclature of Modrowet al. (28).

Monoclonal antibody production and screening assays

Hybridomas producing monoclonal antibodies to MN-rgp120 (recombinantlyproduced gp120 from the MN strain of HIV) (3) were prepared and screenedfor CD4blocking activity as described previously (7, 33). The binding ofmonoclonal antibodies to MN-rgp120 and to rgp120s from the IIIB, NY-5,Z6, Z321, JRcsf, and A244 strains of HIV-1 was assessed by enzyme linkedimmunosorbent assays (ELISA) as described previously (33).

Virus binding and neutralization assays

The ability of monoclonal antibodies to neutralize HIV-1 infectivity invitro was assessed in a calorimetric MT-2 cell cytotoxicity assaysimilar to that described previously (35). MT-2 cells andH9/HTLV-III_(MN) cells were obtained through the AIDS Research andReference Reagent Program, Division of AIDS, NIAID, NIH: contributed byDrs. Douglas Richman and Robert Gallo, respectively. Briefly, serialdilutions of antibody or serum were prepared in 50 μl volumes ofcomplete and then 50 μl of a prediluted HIV-1 stock was added to eachwell. After incubation for 1 hr at 4° C., 100 μl of a 4×10⁵ MT-2 cell/mlsuspension was added. After incubation of the plates for 5 days at 37°C. in 5% CO₂, viable cells were measured using metabolic conversion ofthe formazan MTT dye. Each well received 20 μl of a 5 mg/ml MTT solutionin PBS.

After a 4 hr incubation at 37° C., the dye precipitate was dissolved byremoving 100 μl of the cell supernatant, adding 130 μl of 10% TritonX-100 in acid isopropanol, then pipeting until the precipitate wasdissolved. The optical density of the wells was determined at 540 nm.The percentage inhibition was calculated using the formula:

    1-(virus control-experimental)/(virus control-medium control)

Cell surface staining of HIV-1 infected cells with monoclonal antibodies

H9 cells (2×10⁵) chronically infected with the IIIB, HXB2, HXB3, andHX10 substrains of HIV-1_(LAI) or with HIV-1_(MN) were incubated for 30min at room temperature with monoclonal antibodies (10 μg per ml) in 100μl of RPMI 1640 cell culture media containing 1% FCS. Cells were washedand then incubated with 20 μg per ml of fluorescein-conjugated,affinity-purified, goat antibody to mouse IgG (Fab')₂ (Cappel, WestChester, Pa.) for 30 min. Cells were washed, fixed with 1%paraformaldehyde and the bound antibody was quantitated by flowcytometry using a FACSCAN (Becton-Dickenson, Fullerton, Calif.).

Fluorescence data was expressed as percentage of fluorescent cellscompared to the fluorescence obtained with the second antibody alone.Fluorescence was measured as the mean intensity of the cells expressedas mean channel number plotted on a log scale.

Fragmentation of the MN-rgp120 gene

Fragments of the MN-rgp120 gene were generated using the polymerasechain reaction (PCR) (17). Briefly, forward 30-mer oligonucleotide DNAprimers incorporating a Xho 1 site, and reverse 36-mer oligonucleotideDNA primers containing a stop codon followed by a Xba 1 site weresynthesized and used for the polymerase chain reactions. Thirty cyclesof the PCR reaction were performed using 0.3 μg of a plasmid containingthe gene for gp120 from the MN strain of HIV-1 (pRKMN. D533) and 0.04 nMof a designated primers. The PCR reaction buffer consisted of 0.1M Trisbuffer (pH 8.4), 50 mM KCl, 0.2 mM 4dNTP (Pharmacia, Piscataway, N.J.),0.15M MgCl₂ and 0.5 Unit of Taq Polymerase (Perkin-Elmer Cetus, Norwalk,Conn.) and a typical PCR cycle consisted of a 60 second denaturationstep at 94° C., followed by a 45 second annealing step at 55° C., andthen an extension step at 72° C. for 45 seconds.

Following the PCR amplification, the PCR products were purified byphenol and chloroform extraction, and then ethanol precipitated. Thepurified products were then digested with the restriction endonucleasesXho1 and Xba1. The resulting PCR products were gel purified using 1%agarose (SEAKEM, FMC Bioproducts, Rockland, ME) or 5% polyacrylamide gelelectrophoresis (PAGE) and then isolated by electroelution.

Site directed mutagenesis of the MN-rgp120 C4domain

A recombinant PCR technique (15) was utilized to introduce single aminoacid substitutions at selected sites into a 600 bp Bgl II fragment ofMN-rgp120 that contained the C4 domain. This method entailed the PCRamplification of overlapping regions of the C4 domain of gp120 usingprimers that incorporated the desired nucleotide changes. The resultantPCR products were then annealed and PCR amplified to generate the finalproduct. For these reactions 18-mer "outside" primers encoding the wildtype sequence (Bgl II sites) were amplified with 36-mer "inside" primersthat contained the alanine or glutamic acid residue changes. The firstPCR reaction included 1× of the Vent polymerase buffer (New EnglandBiolabs, Beverly, Mass.), 0.2 mM of 4dNTP (Pharmacia, Piscataway, N.J.),0.04 nM of each synthetic oligonucleotide, 0.3 μg of linearized plasmid,pRKMN.D533, which contained the MN-rgp120 gene. Thirty PCR cycles wereperformed consisting of the following sequence of steps: 45 seconds ofdenaturation at 94° C., 45 second of annealing at 55° C. and 45 secondsof extension at 72° C. Following PCR amplification, the product pairswere gel purified using a 1% solution of low melt agarose (SeaPlaque,FMC Bioproducts, Rockland, Me.).

The agarose containing PCR product was melted at 65° C. and combinedwith the PCR product of the overlapping pair and equilibrated to 37° C.Added to this (20 μl) was 10 μl of 10× Vent Polymerase buffer, 10 μl of2 mM 4dNTP, 0.04 nM each of the "outside" wild type 18 meroligonucleotides, 57 μl of H₂ O and 1 unit of Vent Polymerase. ThirtyPCR cycles were performed as previously above.

The resulting PCR products were purified and digested with the Bgl IIendonuclease. The digested PCR product was then ligated into themammalian cell expression vector pRKMN.D533, which had been digestedwith Bgl II allowing for the removal of a 600 bp fragment. Coloniescontaining the correct insertion were identified and Sequenase 2.0supercoil sequencing was employed to check for fidelity and theincorporation of the desired mutation.

Expression of gp120 fragments in mammalian cells

Fragments of the MN and IIIB gp120 were expressed in mammalian cells asfusion proteins incorporating N-terminal sequences of Herpes SimplexVirus Type 1 (HSV-1) glycoprotein D (gD-1) as described previously (14,22). Briefly, isolated DNA fragments generated by the PCR reaction wereligated into a plasmid (pRK.gD-1) designed to fuse the gp120 fragments,in frame, to the 5' sequences of the glycoprotein D (gD) gene of Type 1Herpes Simplex Virus (gD-1)and the 3' end to translational stop codons.The fragment of the gD-1 gene encoded the signal sequence and 25 aminoacids of the mature form of HSV-1 protein. To allow for expression inmammalian cells, chimeric genes fragments were cloned into the pRK5expression plasmid (8) that contained a polylinker with cloning sitesand translational stop codons located between a cytomegalovirus promotorand a simian virus 40 virus polyadenylation site.

The resulting plasmids were transfected into the 293s embryonic humankidney cell line (12) using a calcium phosphate technique (11). Growthconditioned cell culture media was collected 48 hr after transfection,and the soluble proteins were detected by ELISA or by specificradioimmunoprecipitation where metabolically labeled proteins from cellculture supernatants were resolved by sodium dodecyl sulfatepolyacrylamide gel electrophoresis (PAGE) and visualized byautoradiography as described previously (1, 18).

Radioimmunoprecipitation of MN-rgp120 mutants

Plasmids directing the expression of the MN-rgp120 C4domain mutants weretransfected into 293s cells as described above. Twenty four hoursfollowing the transfection, the cells were metabolically labeled with ³⁵S!-labeled methionine or cysteine as described previously (1). Thelabeled cell culture supernatants were then harvested and 0.5 mlaliquots were reacted with 1-5 μg of the monoclonal antibody or with 2μl of the polyclonal rabbit antisera to MN-rgp120 and immunoprecipitatedwith Pansorbin (CalBiochem, La Jolla, Calif.) as described previously(1). The resulting Pansorbin complex was pelleted by centrifugation,washed twice with a solution containing PBS, 1% NP-40 and 0.05% SDS andthen boiled in a PAGE sample buffer containing 1% 2-mercaptoethanol. Theprocessed samples were the analyzed by SDS-PAGE and visualized byautoradiography (1, 18).

Assays to measure the binding of monoclonal antibodies to mutagenizedMN-rgp120 polypeptides

An ELISA was developed to screen for reactivity of MN-rgp120 fragmentsand mutant proteins with various monoclonal antibodies. In this assay,96 well microtiter dishes (Maxisorp, Nunc, Roskilde, Denmark) werecoated overnight with mouse monoclonal antibody (5B6) to gD-1, at aconcentration of 2.0 μg/ml in phosphate buffered saline (PBS). Theplates were blocked in a PBS solution containing 0.5% bovine serumalbumin (PBSA) and then incubated with growth conditioned cell culturemedium from transfected cells expressing the recombinant gp120 variantsfor 2 hr at room temperature. The plates were washed three times in PBScontaining 0.05% Tween 20 and then incubated with the purified,HRPO-conjugated monoclonal antibodies. Following a 1 hr incubation, theplates were washed three times and developed with the calorimetricsubstrate, o-phenylenediamine (Sigma, St. Louis, Mo.).

The optical densities in each well were then read in a microtiter platereading spectrophotometer at 492 nm. Each cell culture supernatantcontaining fragments or mutated rgp120s was normalized for expressionbased on the titering of its reactivity to the V3 binding monoclonalantibody 1034 or to a rabbit polyclonal antisera to MN-rgp120. Data fromthese experiments were expressed as a ratio of the optical densitiesobtained with the CD4 blocking monoclonal antibodies binding to thefragments or MN-rgp120 mutants compared with the full length wild typergp120s.

To normalize for different concentrations of MN-rgp120-derived proteinin the cell culture supernatants, the binding of the CD4 blockingmonoclonal antibodies to each preparation was compared to that of anHRPO-conjugated monoclonal antibody to the V3 domain of MN-rgp120(1034). Data from these experiments were expressed as a ratio of theoptical densities obtained with the CD4 blocking monoclonal antibodiesto the HRPO conjugated V3 reactive monoclonal antibody.

CD4 binding assays

The ability of monoclonal antibodies to inhibit the binding of MN-rgp120to recombinant soluble CD4 (rsCD4) was determined in a solid phaseradioimmunoassay similar to that described previously (33). The effectof single amino acid substitutions on the binding of MN-rgp120 mutantsto CD4 was determined in a co-immunoprecipitation assay similar to thatdescribed previously (21). Briefly, 293 cells were metabolically labeledwith ³⁵ S-methionine 24 hr after transfection with plasmids expressingMN-rgp120 variants. Growth conditioned cell culture medium (0.5 ml) wasthen incubated with 5.0 μg of recombinant sCD4 for 90 minutes at roomtemperature. Following this incubation, 5.0 μg of an anti-CD4monoclonalantibody (465), known to bind to an epitope remote from the gp120binding site, was added and allowed to incubate another 90 minutes atroom temperature.

The gp120-CD4-antibody complexes were precipitated with Pansorbin thathad been washed with PBS, preabsorbed with 0.1% bovine serum albumin andthen bound with 50 μg of an affinity purified rabbit antimouse IgG(Cappel, West Chester, Pa.). The pellet was washed twice with PBS 1%NP-40, 0.05% SDS, and then boiled in beta mercaptoethanol containingSDS-PAGE sample buffer. The immunoprecipitation products were resolvedby SDS PAGE and visualized by autoradiography as described previously(1, 21).

Antibody affinity measurements

Anti-gp120 antibodies were iodinated with Na ¹²⁵ I with iodogen (Pierce,Rockford, Ill.). Briefly, 50 μg of antibody in PBS was placed in 1.5 mlpolypropylene microcentrifuge tubes coated with 50 μg of Iodogen. Twomillicuries of carrier free Na ¹²⁵ ! was added. After 15 min., free ¹²⁵I was separated from the labeled protein by chromatography on a PD-10column (Pierce, Rockford, Ill.) pre-equilibrated in PBS containing 0.5%gelatin. Antibody concentrations following iodination were determined byELISA to calculate specific activities.

For binding assays, 96-well microtiter plates were coated with 100μl/well of a 10 μg/ml solution of MN-rgp120 or IIIBrgp120 in 0.1Mbicarbonate buffer, pH 9.6 and incubated for 2 hr at room temperature orovernight at 4° C. To prevent non-specific binding, plates were blockedfor 1-2 hr at room temperature with 200 μl/well of a gelatin solutionconsisting of PBS containing 0.5% (wt/vol) gelatin and 0.02% sodiumazide. Unlabeled anti-gp120 monoclonal antibody (0 to 400 nM) wastitrated (in duplicate) in situ and radiolabeled antibody was added toeach well at a concentration of 0.5 nM.

After a 1-2 hr incubation at room temperature, the plate was washed 10×with the PBS/0.5% gelatin/0.02% azide buffer to remove free antibody.The antibody-gp120 complexes were solubilized with 0.1N NaOH/0.1% SDSsolution and counted in a gamma counter. The data were analyzed by themethod of Scatchard (40) using the Ligand analytical software program(31). K_(d) values reported represent the means of four independentdeterminations.

RESULTS

Characterization of monoclonal antibodies to MN-rgp120 that block CD4binding

Monoclonal antibodies prepared from mice immunized with MN-rgp120 (3,33), were screened for the ability to bind to MN-rgp120 coatedmicrotiter dishes by ELISA as described previously (33). Of the thirtyfive clones obtained, seven were identified (1024, 1093, 1096, 1097,1110, 1112, and 1127) that were able to inhibit the binding of MN-rgp120to recombinant CD4 in ELISA (FIG. 1) or solid phase or cell surfaceradioimmunoassays (21, 33). Previous studies have shown that twodistinct classes of CD4 blocking monoclonal antibodies occur: those thatbind to conformation dependent (discontinuous) epitopes (16, 26, 33, 35,45) and those that bind to conformation independent (sequential)epitopes (4, 7, 21, 33, 43).

To distinguish between these two alternatives, the binding of themonoclonal antibodies to denatured (reduced and carboxymethylated)MN-rgp120 (RCM-gp120) was measured by ELISA as described previously(33). As illustrated in Table 4, below, it was found that all of the CD4blocking monoclonal antibodies reacted with the chemically denaturedprotein; indicating that they all recognized conformation independent(sequential) epitopes.

                  TABLE 4    ______________________________________    Properties of monoclonal antibodies to MN-rgp120          CD4    HIV-1 mn              C4    rg120          Inhi-  Neutral- HIV-1 mn                                 CM-   Domain                                             cross    MAb   bitors ization  V3     rgp120                                       peptides                                             reactivity    ______________________________________    1024  +      +        -      +     -     2    1093  +      +        -      +     -     2    1096  +      +        -      +     -     2    1097  +      +        -      +     -     2    1110  +      +        -      +     -     2    1112  +      +        -      +     -     2    1127  +      +        -      +     -     2    1026  -      +        +      +     -     1,2,3,4,6    1092  -      -        -      +     -     1,2,3,4,5    1126  -      -        -      +     -     1,2,3,5,7    1086  -      -        -      +     -     2    13H8  +      -        -      +     1,3   1,2,3,4,5,6,7    ______________________________________     rgp120 cross reactivity: 1, IIIBrg120; 2, MNrgp120, 3, NYSrgp120; 4,     JrCSFrgp120; 5, Z6rgp120; 6, Z321rgp120; 7, A244rgp120     C4 doxnain peptides:     1, FINMWQEVGKAMYAPPIS (SEQ. ID. NO. 24);     2, MWQEVGKAMYAP (SEQ. ID. NO. 25 );     3, GKAMYAPPIKGQIR (SEQ. ID. NO. 26)

The cross reactivity of these monoclonal antibodies was assessed byELISA as described previously (33). In these experiments, the ability ofthe monoclonal antibodies to bind to a panel of seven rgp120s, preparedfrom the IIIB, MN, Z6, Z321, NY-5, A244, and JRcsf isolates of HIV-1,was measured by ELISA (33). It was found that all of the CD4 blockingmonoclonal antibodies were strain specific and bound only to gp120 fromthe MN strain of HIV-1 (Table 4). However, other antibodies from thesame fusion (1026,1092, and 1126) exhibited much broader crossreactivity (Table 4, FIG. 2), as did a CD4 blocking monoclonal antibodyto IIIB-rgp120 (13H8) described previously (33).

Further studies were performed to characterize the neutralizing activityof the antibodies to MN-rgp120. In these studies, monoclonal antibodieswere incubated with cell free virus (HIV-1_(MN)), and the resultingmixture was then used to infect MT-2 cells in microtiter plates. After 5days, the plates were developed by addition of the calorimetric dye,MTT, and cell viability was measured spectrophotometrically. It wasfound (Table 4, FIG. 2) that all of the CD4blocking monoclonalantibodies were able to inhibit viral infectivity. However the potencyof the monoclonal antibodies varied considerably with some monoclonalantibodies (e.g. 1024) able to inhibit infection at very lowconcentrations (IC₅₀ of 0.08 μg per ml) whereas other monoclonalantibodies (e.g. 1112) required much higher concentrations (IC₅₀ of 30μg per ml). In control experiments two monoclonal antibodies toMN-rgp120 from the same fusion (e.g.1086,1092) were ineffective, whereasthe 1026 monoclonal antibody exhibited potent neutralizing activity.Similarly, monoclonal antibodies to the V3 domain of IIIB-rgp120 (10F6,11G5) known to neutralize the infectivity HIV-1_(IIIB) (33), were unableto neutralize the HIV-1_(MN) virus.

Binding studies using synthetic peptides were then performed to furtherlocalize the epitopes recognized by these monoclonal antibodies asdescribed previously (33). When a peptide corresponding to the V3 domain(3) of MN-rgp120 was tested, it was found that none of the CD4 blockingantibodies showed any reactivity. However the epitope recognized by thenon-CD4 blocking monoclonal antibody, 1026, prepared against MN-rgp120could be localized to the V3 domain by virtue of its binding to thispeptide. In other experiments, three synthetic peptides from the C4domain of gp120 that incorporated sequences recognized by the CD4blocking, weakly neutralizing monoclonal antibodies described byMcKeating et al. (26) were tested (Table 4). It was found that none ofthe CD4 blocking monoclonal antibodies to MN-rgp120 reacted with thesepeptides, however the non-neutralizing, CD4 blocking 13H8 monoclonalantibody bound to the peptides corresponding to residues 423-440 ofIIIB-gp120 and residues 431-441 of MN-gp120, but not to thatcorresponding to residues 426-437 of IIIB-gp120. Thus the 13H8monoclonal antibody recognized a epitope that was similar, if notidentical, to that described by McKeating et al. (26). This result isconsistent with the observation that the 13H8 monoclonal antibody andthe monoclonal antibodies described by Cordell et al. (4) and McKeatinget al. (26) exhibited considerable cross reactivity, whereas theantibodies to MN-rgp120 were highly strain specific.

CD4 blocking antibodies recognize epitopes in the C4 domain

Previously, a strain specific, CD4 blocking monoclonal antibody (5C2)raised against IIIB-rgp120 was found to recognize an epitope in the C4domain of IIIB-rgp120 (21, 33). Although the 5C2 monoclonal antibody wasable to block the binding of rgp120 to CD4, it was unable to neutralizeHIV-1 infectivity in vitro (7). Affinity columns prepared from 5C2adsorbed an 11 amino acid peptide (residues 422 to 432) from a trypticdigest of gp120 (21), however monoclonal antibody 5C2 was unable torecognize this peptide coated onto wells of microtiter dishes in anELISA format (Nakamura et al., unpublished results).

To determine whether the CD4 blocking monoclonal antibodies raisedagainst MN-rgp120 recognized the corresponding epitope in the C4 domainof MN-rgp120, a series of overlapping fragments, spanning the V4 and C4domains of HIV-1_(MN) gp120, were prepared for expression in mammaliancells. A diagram of the fragments expressed is shown in FIGS. 3A and 3B.The C4 domain fragments were expressed as fusion proteins thatincorporated the signal sequence and amino terminal 25 amino acids ofHSV-1 glycoprotein D as described above.

Plasmids directing the expression of the chimeric C4 domain fragmentswere transfected into 293 cells, and their expression was monitored byradioimmunoprecipitation studies where a monoclonal antibody, 5B6,specific for the mature amino terminus of glycoprotein D was utilized.It was found (FIG. 3B) that all of the fragments were expressed andexhibited mobilities on SDS-PAGE gels appropriate for their size. ThusfMN.368-408 (lane 1) exhibited a mobility of 19 kD; fMN.368-451 (lane 2)exhibited a mobility of 29 kD; fMN.419-433 (lane 3) exhibited a mobilityof 6 kD, and fMN.414-451 (lane 4) exhibited a mobility of 6.1 kD.

The binding of monoclonal antibody 1024 to the recombinant fragments wasthen determined by ELISA (as described in Example 1). It was found (FIG.3A) that monoclonal antibody 1024 reacted with the fragments thatcontained the entire C4 domain of MN-rgp120 (fMN₃₆₈₋₄₅₁, fMN₄₀₄₋₄₅₅),but failed to bind to a fragment derived from the adjacent V4 domain(fMN₃₆₈₋₄₀₈) or to another fragment that contained V4 domain sequencesand the amino terminal half of the C4 domain (fMN₃₆₈₋₄₂₈). The fact that1024 bound to the fMN₄₁₄₋₄₅₁ and fMN₄₁₉₋₄₄₃ fragments demonstrated thatthe epitopes recognized by all of these monoclonal antibodies werecontained entirely between residues 419 and 443 in the C4 domain.

Residues recognized by monoclonal antibodies that block binding ofMN-rgp120 to CD4. To identify specific amino acid residues that might bepart of the epitopes recognized by these monoclonal antibodies, thesequence of the C4 domain of MN-rgp120 was compared to those of thegp120s from the six other rgp120s that failed to react with the CD4blocking monoclonal antibodies (FIG. 4). It was noted that the sequenceof MN-rgp120 was unique in that K occurred at position 429 whereas theother rgp120s possessed either E, G, or R at this position. Anotherdifference was noted at position 440 where E replaced K or S. Toevaluate the significance of these substitutions, a series of pointmutations were introduced into the MN-rgp120 gene (FIG. 5). Plasmidsexpressing the mutant proteins were transfected into 293s cells, andexpression was verified by radioimmunoprecipitation with a monoclonalantibody (1034) directed to the V3 domain of MN-rgp120. Cell culturesupernatants were harvested and used for the monoclonal antibody bindingstudies shown in Table 6. To verify expression,radio-immunoprecipitation studies using cell culture supernatants fromcells metabolically labeled with ³⁵ !S-methionine were performed usingthe 1024 monoclonal antibody specific for the C4 domain of MN-rgp120 (A)or the 1034 monoclonal antibody specific for the V3 domain of MN-rgp120.Immune complexes were precipitated with the use of fixed S. aureus andthe adsorbed proteins were resolved by SDS-PAGE. Proteins werevisualized by autoradiography. The samples were: Lane 1, MN.419A; lane 2MN.421A; lane 3 MN.429E; lane 4, MN.429A; lane 5, MN.432A; lane 6,MN.440A; lane 7, MN-rgp120. The immunoprecipitation study showed that1024 antibody binds well to all the variants except 3 and 4 which aremutated at residue 429. 1034 antibody was used as a control andprecipitates with anti-V3 antibodies.

The effect of these mutations on the binding of the CD4 blockingmonoclonal antibodies was then evaluated by ELISA as illustrated inTable 5, below.

                  TABLE 5    ______________________________________    Binding of CD4 blocking monoclonal    antibodies to C4 domain mutants    Proteins/    MAbs     1024   1093   1096 1097 1110 1112 1127 5C2    ______________________________________    MN-rgp120             1.0    1.0    1.0  1.0  1.0  1.0  1.0  0.05    MN-419A  1.11   1.10   0.94 1.21 0.78 0.95 1.10 ND    MN-421A  1.11   1.60   0.88 1.42 1.34 0.91 1.10 ND    MN-429E  0.03   0.07   0.11 0.04 0.10 0.10 0.02 ND    MN-429A  0.10   0.07   0.14 0.04 0.09 0.11 0.05 ND    MN-432A  0.77   0.15   0.59 0.08 0.12 0.24 0.26 ND    MN-440A  1.06   1.13   1.08 0.87 1.12 1.0  1.3  ND    IIIB-rgp120             0.03   ND     ND   ND   ND   ND   ND   1.0    MN-423F  ND     ND     ND   ND   ND   ND   ND   0.45    MN-423F, ND     ND     ND   ND   ND   ND   ND   1.09    429E    ______________________________________

Data represent the relative binding of MAbs to the native and mutantforms of rgp120. Values were calculated by dividing the binding(determined by ELISA) of the CD4 blocking MAbs to the proteins indicatedby the values obtained for the binding of a V3 specific MAb (1034) tothe same proteins (as described in Example 1).

It w as found that replacement of K₄₄₀ with an A residue (MN.440A) hadno effect on the binding of the 1024 monoclonal antibody or any of theother CD4 blocking monoclonal antibodies (Table 5). The significance ofK at position 429 was then evaluated by substitution of either A(MN.429A) or E (MN.429E) at this location. It was found that the A for Ksubstitution at position 429 (MN.420A) markedly reduced the binding ofthe 1024 monoclonal antibody and all of the other CD4 blockingmonoclonal antibodies (Table 5). Similarly, the replacement of E for K(MN.429E) at this position totally abrogated the binding of the 1024monoclonal antibody and all of the other CD4 blocking monoclonalantibodies (Table 5). Several other mutants were constructed to evaluatethe role of positively charged residues in the C4 domain. It was foundthat A for K substitutions at positions 419 (MN.419A) and 421(MN.421A)failed to interfere with the binding of any of the CD4 blockingmonoclonal antibodies as illustrated in Table 6, below.

                  TABLE 6    ______________________________________    Correlation Between Antibody Binding Affinity    and Virus Neutralizing Activity    MAb      Block       K.sub.d, nM.sup.c                                   IC.sub.50, nM.sup.d    ______________________________________    1024.sup.c             +           2.7 ± 0.9                                   0.4    1086.sup.e,f             -           9.7 ± 2.2                                   --    1093.sup.e             +           9.9 ± 2.6                                   3.3    1096.sup.c             +           10 ± 6 12    1097.sup.c             +           13.4 ± 3.7                                   12    1110.sup.c             +           12.1 ± 1.7                                   12    1112.sup.c             +            20 ± 4.4                                   200    1127.sup.c             +           9.3 ± 4                                   3.3    1086.sup.e,f             -           9.7 ± 2.2                                   --    13H8.sup.f,g             .sup. +.sup.b                         22 ± 6 --    ______________________________________     .sup.a Blocked binding of rgp120 MN to CD4.     .sup.b Blocked binding of rgp120 IIIb, not rgp120 MN, to CD4.     .sup.c Mean of four determinations calculated using the method of     Scatchard (40).     .sup.d Neutralization of HIV1.sub.MN infectivity in vitro.     .sup.e Antirgp120 MN antibody.     .sup.f Did not neutralize HIV1 infectivity.     .sup.g Antirgp120 IIIb antibody.

However, when K at position 432 was replaced with A (MN432.A), thebinding of all of the CD4 blocking antibodies was markedly reduced(Table 5). Interestingly, the binding of monoclonal antibody 1024appeared less affected by this substitution than the other monoclonalantibodies (Table 5). Thus, these studies demonstrated that K₄₂₉ andK₄₃₂ were critical for the binding of all of the CD4 blocking monoclonalantibodies, and that K₄₁₉, K₄₂₁, and K₄₄₀ did not appear to play a rolein monoclonal antibody binding.

Amino acids recognized monoclonal antibodies that block binding ofIIIB-rgp120 to CD4

The identification of residues 429 and 432 as being part of the epitoperecognized by the MN-rgp120 specific CD4 blocking monoclonal antibodieswas particularly interesting since this region was previously found tobe implicated in the binding of the 5C2 monoclonal antibody (21). Theproperties of the 1024 like-monoclonal antibodies and the 5C2 monoclonalantibody differed from the C4 reactive monoclonal antibodies describedby other investigators (4, 43) in that the former appeared strainspecific and the latter were broadly cross reactive. To account for thestrain specificity of these monoclonal antibodies, the sequence of theeleven amino acid peptide of IIIB-rgp120 recognized by monoclonalantibody 5C2 was compared to the corresponding sequence of MN-rgp120. Itwas found that the IIIB protein differed from the MNB protein atpositions 429 where K replaced E and at position 423 where I replaced F(FIG. 5). Because it was known from previous studies (33) that the 5C2monoclonal antibody was unable to bind to gp120 from two strains (i.e.,NY-5 and JRcsf) that also possessed E at position 423, it seemedunlikely that this position could account for the strain specificity of5C2. Sequence comparison (FIG. 5) also showed that gp120 fromHIV-1_(IIIB) was unique in that a phenylalanine residue occurred atposition 423 whereas the other six strains examined possess an I at thisposition.

To determine whether residues 423 and/or 429 could account for the typespecificity of the 5C2 monoclonal antibody, a mutant of MN-rgp120 wasconstructed which incorporated an F for I replacement at position 423(MN.423F). In addition, the MN-rgp120 mutant, MN.429E (described above)was further mutagenized to incorporate a F for I substitution atposition 423 (MN.423F), thus resulting in a double mutant (MN.423F,429E)whose sequence was identical to that of IIIB-rgp120 within the 10 aminoacid 5C2 epitope (FIG. 4). The expression of these mutants in 293s cellswas verified by radioimmunoprecipitation using rabbit polyclonalantisera to MN-rgp120. When the binding of the 13H8 monoclonal antibodyto a set of mutants incorporating substitutions at position 423 and 429was examined, it was found that none of the replacements effected thebinding of this antibody (data not shown). When the 5C2 monoclonalantibody was examined, it was found that the F for I replacement (MN.423F) conferred partial reactivity (Table 5). When the double mutant(MN.423F,429E), containing the F for I substitution as well as the E forK substitution was tested, binding that was indistinguishable from thatto IIIB-rgp120 was observed (Table 5). These results demonstrated that Fat position 423 and E at position 429 both play a role in binding of the5C2 monoclonal antibody, and suggest that the strain specificity of 5C2can be attributed to the residues at these positions.

Examination of the sequences of gp120 from the various clones of LAIthat have been analyzed revealed that several substrains of LAI differedfrom each other in the C4 domain. Thus the sequences of the IIIB (30),Bru (46), and HXB3 (6) clones of LAI were identical at positions 423 and429 where F and E residues occurred respectively. However, the sequenceof the HXB2 substrain (36) differed from the others at these positionswhere, like MN-rgp120, K replaced E and at position 423 where I replacedF (FIG. 5). Similarly, the HX10 and BH10 substrains (36, 37) differedonly at position 423 where, like HIV-1_(MN), I replaced F. Based on themutagenesis experiments above, it would be predicted that monoclonalantibody 1024 should be able to bind to gp120 from the HXB2 substrain ofLAI, but not the HXB3 substrain. If I₄₂₃ was important for binding, then1024 should also bind the HX10 substrain.

To test this hypothesis, the binding of monoclonal antibody 1024 to thesurface cells infected with either IIIB, HXB2, HXB3, and HX10 substrainsof HIV-1_(LAI) was measured by flow cytometry. It was found thatmonoclonal antibody 1024 was able to bind only HXB2 providing furtherconfirmation that residues 423 and 429 were important for the binding ofthis antibody. The fact that monoclonal antibody 1024 did not bind toHX10 infected cells suggested that I₄₂₃ was not important for thebinding of this monoclonal antibody. Thus these studies demonstrate thatreactivity with the 1024 monoclonal antibody segregates with theoccurrence of F and E residues at positions 423 and 429, respectively,and shows that substrains of HIV-1_(LAI) differ from one another at afunctionally significant epitope in the C4 domain.

Neutralizing activity of CD4 blocking antibodies correlates with theirbinding affinity

To account for the difference in virus neutralizing activity between theCD4 blocking monoclonal antibodies, their gp120 binding affinities weredetermined by competitive binding of ¹²⁵ !-labeled monoclonal antibodyto rgp120 (Table 6). Typical Scatchard analysis of data from theseassays is shown in FIG. 7 (A to C). Linear, one-site binding kineticswere observed for all the monoclonal antibodies to MN-rgp120, suggestingthat only a single class of sites was recognized, and that there was nocooperativity between two combining sites of each immunoglobulinmolecule. It was found (FIG. 7A, Table 6) that monoclonal antibody 1024,which exhibited the most potent virus neutralizing activity (IC₅₀ of0.08 μg per ml), possessed the lowest K_(d) (2.7 nM). In contrast (FIG.7C, Table 6), monoclonal antibody 1112, the antibody that exhibited theweakest virus neutralizing activity (IC₅₀ of 30 μg per ml) possessed thehighest K_(d) (20 nM). K_(d) s for six additional CD4-blockingmonoclonal antibodies raised against MN-rgp120 were also determined(Table 6). It was found that monoclonal antibodies that possessedintermediate KdS similarly possessed intermediate neutralization IC₅₀values. To explore the relationship between virus neutralizing activityand gp120 binding affinity, the data in Table 6 was plotted in severaldifferent ways. It was found that when the K_(d) of the monoclonalantibodies was plotted as a function of the log of the IC₅₀, a linearrelationship was obtained (FIG. 8). Using this analysis a correlationcoefficient (r) of 0.97) was obtained. Thus, this graph demonstratesthat the virus neutralizing activity of these monoclonal antibodies isdirectly proportional to the gp120 binding affinity, and that thethreshold for neutralization at this epitope is defined by the slope ofthe graph in FIG. 8.

A similar analysis was performed with the non-neutralizing CD4 blockingmonoclonal antibodies to IIIB-rgp120, 5C2 and 13H8. The binding curvefor 13H8 (FIG. 7C) showed that it bound to a single class of sites onIIIB-rgp120 with a K_(d) of 22 nM. The affinity of 5C2 could not bedetermined by this assay because at antibody concentrations greater than5 nM, non-linear (reduced gp120 binding) was observed. This effect wassuggestive steric hindrance at these concentrations or negativecooperativity between combining sites. The binding affinity was alsodetermined for the non-neutralizing, non-CD4 blocking monoclonalantibody to MN-rgp120, 1086. The fact that this antibody exhibited abinding affinity similar (9.7 nM) to many of the neutralizing monoclonalantibodies but failed to inhibit infectivity, proves that high antibodybinding affinity alone is not sufficient for neutralization.

Effect of C4 Domain Mutants on CD4 binding

Finally, the CD4 binding properties of the series of MN-rgp120 mutants,constructed to localize the C4 domain epitopes, were measured in aqualitative co-immunoprecipitation assay. In these studies the abilityof the mutagenized MN-rgp120 variants to co-immunoprecipitate CD4 wasevaluated as described previously (21) in a qualitativeco-immunoprecipitation assay similar to that described previously (19).Briefly, 293 cells, transfected with plasmids directing the expressionof MN-rgp120 variants described in FIG. 5, were metabolically labeledwith ³⁵ S!-methionine, and the growth conditioned cell culturesupernatants were incubated with rsCD4. The resulting rsCD4:gp120complexes were then immunoprecipitated by addition of the CD4 specificmonoclonal antibody, 465 (A) or a positive control monoclonal antibody(1034) directed to the V3 domain of MN-rgp120 (B). Theimmunoprecipitated proteins were resolved by SDS-PAGE and visualized byautoradiography as described previously (3). The samples were: Lane 1,MN.419A; lane 2, MN.421A; lane 3, MN.429E; lane 4, MN.429A; lane 5,MN.432A; lane 6, MN.440A; lane 7, MN-rgp120. The gel showed that themutants that block antibody binding do not block binding of CD4.Therefore, the antibodies do not bind to the gp120 CD4-binding contactresidues. This indicates that steric hinderance may inhibit antibodybinding, rather than that the antibodies bind directly to the CD4contact residues to inhibit binding.

It was found that all of the variants in which apolar A residue wassubstituted for the charged K or E residues (e.g., MN.419A, MN.421A,MN.432A, and MN.440A) were still able to co-immunoprecipitate rsCD4.Similarly, the replacement of E for K at position 429 (MN.429E), thereplacement of F for I at position 423 (MN.423F) or the mutant whichincorporated both mutation (MN.423F,429E) also showed no reduction intheir ability to co-immunoprecipitate rsCD4. Thus, radical amino acidsubstitutions at five positions failed to affect the binding of gp120 toCD4. These results were consistent with previous studies (5, 21, 34)where it was found that only a few of the many mutations that have beeninduced in this region effected CD4 binding.

This study indicates that neutralizing epitopes in the C4 domain havenow been found to be located between about residues 420 and 440. Inaddition, the critical residues for antibody binding are residues 429and 432.

EXAMPLE 2 Identification of V2 Neutralizing Epitopes

The procedures described in Example 1 were used to map epitopes in theV2 region of gp120. Table 7 illustrates the results of mutagenicitystudies to map V2 neutralizing epitopes. In the table, the columnsindicate the comparison of binding of the monoclonal antibodies withwild type (WT) gp120 in comparison to various mutations of gp120 usingstandard notation. For example, "G171R" indicates that the glycine (G)at residue 171 has been replaced by an arginine (R). "172A/173A"indicates that the residues at 172 and 173 have been replaced byalanine. The neutralizing monoclonal antibodies tested (MAbs) are listedin the rows. The numerical values in the table are the optical densityvalue of an ELISA assay performed as described in Example 1 to measurethe amount of antibody binding. The underlined values indicatesignificantly reduced binding, indicating the substituted residue iscritical for binding of the antibody.

                  TABLE 7    ______________________________________                     G171R,  172A/         187V/    MAbs    WT       M174V   173A    E187V 188S    ______________________________________    6E10    1.00     0.10    1.28    0.60  0.25    1017    1.00     0.70    1.10    0.87  0.04    1022    1.00     0.80    1.10    1.00  0.00    1028    1.00     0.90    1.18    1.07  0.04    1029    1.00     0.83    1.16    1.01  0.16    1019    1.00     0.13    1.30    0.75  0.74    1027    1.00     0.00    1.20    0.80  0.64    1025    1.00     0.69    0.00    0.00  0.83    1088    1.00     0.73    1.12    0.94  0.03    13H8    1.00     0.77    0.78    0.48  0.65    ______________________________________                             172A/    MAbs    WT       177A    173A    188A  183A    ______________________________________    6E10    1.00     0.36    0.52    0.64  0.43    1017    1.00     0.77    0.77    0.76  0.11    1022    1.00     0.86    0.72    0.14  0.00    1028    1.00     0.93    0.78    0.49  0.04    1029    1.00     0.88    0.85    0.53  0.16    1019    1.00     0.16    0.00    0.41  0.44    1027    1.00     0.00    0.02    0.41  0.49    1025    1.00     0.75    0.0     0.83  0.72    1088    1.00     0.77    0.77    0.53  0.00    13H8    1.00     0.72    0.72    0.53  0.60    ______________________________________

As illustrated in Table 7, the study demonstrated that there are aseries of overlapping neutralizing epitopes from been found to belocated in the V2 region (residues 163 through 200), with most of theepitopes located between residues 163 and 200. In addition, the studyindicates that the critical residues in the V2 domain for antibodybinding are residues 171, 173, 174, 177, 181, 183, 187, and 188.

EXAMPLE 3 Immunization Studies

gp120 from the MN, GNE₈, and GNE₁₆ strains of HIV was prepared byamplifying the gene from each isolate and cloning and expressing thegene in CHO cells as described in Berman et al., J. Virol. 66:4464-4469(1992). Briefly, the gp160 gene was amplified with two rounds ofamplification using the following nested primers according to theprotocol by Kellog et al., pp 337-347 in PCR Protocols: a guide tomethods and amplification. Innis et al. (eds.) Academic Press, Inc., NewYork.

First round primers:

AATAATAGCAATAGTTGTGTGGWCC (W is A or T)

ATTCTTTCCCTTAYAGTAGGCCATCC (Y is T or C)

Second round primers:

GGGAATTCGGATCCAGAGCAGAAGACAGTGGCAATGA

GTCAAGAATTCTTATAGCAAAGCCCTTTCCAA

The primers are SEQ. ID. NOs. 31-34. Each gene is then digested with therestriction endonucleases KpnI and AccI. The resulting fragment wassubcloned into the Bluescript (+) phagemid M13 vector (Stratagene, Inc.)and sequenced by the dideoxynucleotide method (Sanger et al., Proc.Natl. Acad. Sci. USA 74:5463-5467 (1977)).

A fragment of the gp120 coding region was then used to construct achimeric gene for expression in mammalian cells, as described in Laskyet al., Science 223:209-212 (1986). The 5' end was fused to a polylinkeradjacent to a simian virus 40 (SV40) promoter and the 3' end was fusedto a polylinker adjacent to the 3' untranslated sequences containing anSV40 polyadenylation signal. The expression vector (MN-rgp120) wasco-transfected in CHO cells deficient in production of the enzymedihydrofolate reductase, along with a plasmid (pSVdhfr) containing acDNA encoding the selectable marker, dihydrofolate reductase. Cell linesexpressing MN-rgp120 were isolated as described in Lasky et al., Science223:209-212 (1986). The recombinant glycoprotein was purified fromgrowth-conditioned cell culture medium by immunoaffinity and ionexchange chromatography as described in Leonard et al., J. Biol. Chem.265:10373-10382 (1990).

gp120 from the GNE₈ and GNE₁₆ strains of HIV is prepared in the samemanner as described for the MN isolate.

MN-rgp120 (300 μg/injection), GNE₈ -rgp120 (300 μg/injection), and GNE₁₆-rgp120 (300 μg/injection) are prepared in an aluminum hydroxideadjuvant (as described in Cordonnier et al., Nature 340:571-574 (1989)).Six chimpanzees are injected at 0, 4, and 32 weeks. Sera are collectedand assayed for neutralizing antibody to each strain of HIV at the timeof each immunization and three weeks thereafter. At 35 weeks, each ofthe chimpanzees has significant levels of neutralizing antibodies toeach strain.

At 35 weeks, the chimpanzees are randomly assigned to three groups. Eachgroup is challenged with about 10 50% chimpanzee-infectious doses(CID₅₀) each of one of the vaccine isolates. One unimmunized chimpanzee(control) is also injected with the same amount of virus as theimmunized chimpanzees for each vaccine strain.

Sera are drawn every two weeks throughout the study and assayed forantibodies to HIV core proteins and for the presence of HIV by PCRamplification and co-cultivation of peripheral blood mononuclear cells(PBMCs) from the chimpanzee together with activated human or chimpanzeePBMCs. The presence of antibodies to core proteins indicates thepresence of viral infection as does the detection of amplified viral DNAor viral infection of co-cultivated cells.

The presence of virus is detected by PCR and co-cultivation methods ineach unimmunized control animal between weeks 2 and 4 post challenge.Antibodies to core proteins appear in the control chimpanzees at sixweeks post challenge. Neither virus nor antibodies are at detectablelevels in any of the immunized chimpanzees at one year post challenge,indicating that the vaccine effectively protects the chimpanzees frominfection from each of the challenge strains.

REFERENCES

1. Berman, P. W. et al., 1989. Expression and immunogenicity of theextracellular domain of the human immunodeficiency virus type 1 envelopeglycoprotein, gp160. J. Virol. 63:3489-3498.

2. Berman, P. W. et al., 1990. Protection of chimpanzees from infectionby HIV-1 after vaccination with gp120 but not gp160. Nature 345:622-625.

3. Berman, P. W. et al., 1992. Neutralization of multiple laboratory andclinical isolates of HIV-1 by antisera raised against gp120 from the MNisolate of HIV-1. J. Virol. 7:4464-4469.

4. Cordell, J. et al., 1991. Rat monoclonal antibodies tonon-overlapping epitopes of human immunodeficiency virus type 1 gp120block CD4 binding in vitro. Virology 185:72-79.

5. Cordonnier, A. et al., 1989. Single amino acid changes in HIVenvelope affect viral tropism and receptor binding. Nature 340:571-574.

6. Crowl, R. et al., 1985. HTLV-III env gene products synthesized in E.coli are recognized by antibodies present in the sera of AIDS patients.Cell 41:979-986.

7. Dowbenko, D. et al., 1988. Epitope mapping of the humanimmunodeficiency virus type 1 gp120 with monoclonal antibodies. J.Virol. 62:4703-4711.

8. Eaton, D. et al., 1986. Construction and characterization of anactive factor VIII lacking the central one-third of the molecule.Biochemistry 291:8343-8347.

9. Fouchier, R. A. M. et al., 1992. Phenotype-associated sequencevariation in the third variable domain of the human immunodeficiencyvirus type 1 gp120 molecule. J. Virol. 66: 3183-3187.

10. Goudsmit, J. et al., 1988. Human immunodeficiency virus type 1neutralization epitope with conserved architecture elicits earlytype-specific antibodies in experimentally infected chimpanzees. Proc.Natl. Acad. Sci. U.S.A. 85:4478-4482.

11. Graham, F. et al., 1973. A new technique for the assay ofinfectivity of human adenovirus 5 DNA. Virology 52:456-467.

12. Graham, F. L. et al., 1977. Characteristics of a human cell linetransformed by the human adenovirus type 5. J. Gen. Virol. 36:59-77.

13. Gurgo, C. et al., 1988. Envelope sequences of two new United StatesHIV-1 isolates. Virol. 164: 531-536.

14. Haffar, O. K. et al., 1991. The cytoplasmic tail of HIV-1 gp160contains regions that associate with cellular membranes. Virol.180:439-441.

15. Higuchi, R. 1990. Recombinant PCR. p.177-183. In M. A. Innis et al.(eds.), PCR Protocols A Guide to Methods and Applications, AcademicPress, Inc., New York.

16. Ho, D. D. et al., 1991. Conformational epitope on gp120 important inCD4 binding and human immunodeficiency virus type 1 neutralizationidentified by a human monoclonal antibody. J. Virol. 65:489-493.

17. Kellog, D. E. et al., 1990. Detection of Human ImmunodeficiencyVirus, p. 337-347. In M. A. Innis et al. (eds.), PCR Protocols A Guideto Methods and Applications, Academic Press, Inc., New York.

18. Laemmli, U. K. 1970. Cleavage of structural proteins during theassembly of the head of bacteriophage T4. Nature 227:680-685.

19. Langedijk, J. P. M. et al., 1991. Neutralizing activity ofanti-peptide antibodies against the principal neutralization domain ofhuman immunodeficiency virus type 1. J. Gen. Virol. 72:2519-2526.

20. LaRosa, G. J. et al., 1990. Conserved sequences and structuralelements in the HIV-1 principal neutralizing determinant. Science249:932-935.

21. Lasky, L. A. et al., 1987. Delineation of a region of the humanimmunodeficiency virus gp120 glycoprotein critical for interaction withthe CD4 receptor. Cell 50:975-985.

22. Lasky, L. A. et al., 1986. Neutralization of the AIDS retrovirus byantibodies to a recombinant envelope glycoprotein. Science 233: 209-212.

23. Matsushita, S. et al., 1988. Characterization of a humanimmunodeficiency virus neutralizing monoclonal antibody and mapping of aneutralizing epitope. J. Virol. 62:2107-2114.

24. McCutchan, F. E. et al., 1992. Genetic Variants of HIV-1 inThailand. AIDS Res. and Human Retroviruses 8:1887-1895.

25. McKeating, J. et al., 1991. Recombinant CD4-selected humanimmunodeficiency virus type 1 variants with reduced gp120 affinity forCD4 and increased cell fusion capacity. J. Virol. 65: 4777-4785.

26. McKeating, J. A. et al., 1992. Monoclonal antibodies to the C4region of human immunodeficiency virus type 1 gp120: use in topologicalanalysis of a CD4 binding site. AIDS Research and Human Retroviruses. 8:451-459.

27.McNearney, T. et al., 1992. Relationship of human immunodeficiencyvirus type 1 sequence heterogeneity to stage of disease. Proc. Natl.Acad. Sci. U.S.A. 89:10247-10251.

28. Modrow, S. et al., 1987. Computer-assisted analysis of envelopeprotein sequences of seven human immunodeficiency virus isolates:predictions of antigenic epitopes in conserved and variable regions. J.Virol. 61:570-578.

29. Moore, J. P. 1990. Simple methods for monitoring HIV-1 and HIV-2gp120 binding to sCD4 by ELISA: HIV-1 has a 25 fold lower affinity thanHIV-1 for sCD4. AIDS 3:297-305.

30. Muesing, M. A. et al., 1985. Nucleic acid structure and expressionof the human AIDS/lymphadenopathy retrovirus. Nature 313:450-458.

31. Munson, P. J. et al. 1983. LIGAND: a computerized analysis of ligandbinding data. Methods Enzymol. 92:543.

32. Myers, G. et al., 1992. Human Retroviruses and AIDS. A compilationand analysis of nucleic acid and amino acid sequences. Los AlamosNational Laboratory, Los Alamos, N. Mex.

33. Nakamura, G. et al., 1992. Monoclonal antibodies to theextracellular domain of HIV-1_(IIIB) gp160 that neutralize infectivity,block binding to CD4, and react with diverse isolates. AIDS and HumanRetroviruses 8:1875-1885.

34. Olshevsky V. et al., 1990. Identification of individual humanimmunodeficiency virus type 1 gp120 amino acids important for CD4receptor binding. J. Virol. 64:5701-5707.

35. Posner, M. R. et al., 1991. An IgG human monoclonal antibody whichreacts with HIV-1/GP120, inhibits virus binding to cells and neutralizesinfection. J. Immunol. 146:4325-4332.

36. Ratner, L. et al., 1987. Complete nucleotide sequences of functionalclones of the AIDS virus. AIDS Res. and Human Retroviruses 3:57-69.

37. Ratner, L. et al., 1985. Complete nucleotide sequence of the AIDSvirus, HTLV-III. Nature 313:277-284.

38. Reitz, M. S. Jr. et al., 1992. On the historical origins of HIV-1(MN) and (RF). AIDS Research and Human Retroviruses 9: 1539-1541.

39. Rusche, J. R. et al., 1988. Antibodies that inhibit fusion of humanimmunodeficiency virus-infected cells bind to a 24-amino acid sequenceof the viral envelope, gp120. Proc. Natl. Acad. Sci. USA. 85:3198-3202.

40. Scatchard, G. 1949. The attractions proteins for small molecules andions. Ann. N.Y. Acad. Sci. 51: 660-672.

41. Schnittman, S. M. et al., 1988. Characterization of gp120 binding toCD4 and an assay that measures ability of sera to inhibit this binding.J. Immunol. 141:4181-4186.

42. Scott, C. F. Jr. et al., 1990. Human monoclonal antibody thatrecognizes the V3 region of human immunodeficiency virus gp120 andneutralizes the human T-lymphotropic virus type III_(MN) strain. Proc.Natl. Acad. Sci. U.S.A. 87:8597-8601.

43. Sun, N. C. et al., 1989. Generation and characterization ofmonoclonal antibodies to the putative CD4-binding domain of humanimmunodeficiency virus type 1 gp120. J. Virol. 63:3579-3585.

44. Tersmette, M. R. A. et al., 1989. Evidence for a role of virulenthuman immunodeficiency virus (HIV) variants in the pathogenesis of AIDSobtained from studies on a panel of sequential HIV isolates. J. Virol.63: 2118-2125.

45. Tilley, S. A. et al., 1991. A human monoclonal-antibody against theCD4-binding site of HIV-1 GP120 exhibits potent, broadly neutralizingactivity. Res. Virology 142:247-259.

46. Wain Hobson, S. et al., 1985. Nucleotide sequence of the AIDS virus,LAV. Cell 40:9-17.

47. Weiss, R. A. et al., 1986. Variable and conserved neutralizingantigen of human immunodeficiency virus. Nature 324:572-575.

    __________________________________________________________________________    SEQUENCE LISTING    (1) GENERAL INFORMATION:    (iii) NUMBER OF SEQUENCES: 33    (2) INFORMATION FOR SEQ ID NO:1:    (i) SEQUENCE CHARACTERISTICS:    (A) LENGTH: 511 amino acids    (B) TYPE: amino acid    (C) STRANDEDNESS: single    (D) TOPOLOGY: linear    (xi) SEQUENCE DESCRIPTION: SEQ ID NO:1:    MetArgValLysGlyIleArgArgAsnTyrGlnHisTrpTrpGlyArg    151015    GlyThrMetLeuLeuGlyLeuLeuMetIleCysSerAlaThrGluLys    202530    LeuTrpValThrValTyrTyrGlyValProValTrpLysGluAlaThr    354045    ThrThrLeuPheCysAlaSerAspAlaLysAlaTyrAspThrGluAla    505560    HisAsnValTrpAlaThrHisAlaCysValProThrAspProAsnPro    65707580    GlnGluValGluLeuValAsnValThrGluAsnPheAsnMetTrpLys    859095    AsnAsnMetValGluGlnMetHisGluAspIleIleSerLeuTrpAsp    100105110    GlnSerLeuLysProCysValLysLeuThrProLeuCysValThrLeu    115120125    AsnCysThrAspLeuArgAsnThrThrAsnThrAsnAsnSerThrAsp    130135140    AsnAsnAsnSerLysSerGluGlyThrIleLysGlyGlyGluMetLys    145150155160    AsnCysSerPheAsnIleThrThrSerIleGlyAspLysMetGlnLys    165170175    GluTyrAlaLeuLeuTyrLysLeuAspIleGluProIleAspAsnAsp    180185190    SerThrSerTyrArgLeuIleSerCysAsnThrSerValIleThrGln    195200205    AlaCysProLysIleSerPheGluProIleProIleHisTyrCysAla    210215220    ProAlaGlyPheAlaIleLeuLysCysAsnAspLysLysPheSerGly    225230235240    LysGlySerCysLysAsnValSerThrValGlnCysThrHisGlyIle    245250255    ArgProValValSerThrGlnLeuLeuLeuAsnGlySerLeuAlaGlu    260265270    GluGluValValIleArgSerGluAspPheThrAspAsnAlaLysThr    275280285    IleIleValHisLeuLysGluSerValGlnIleAsnCysThrArgPro    290295300    AsnTyrAsnLysArgLysArgIleHisIleGlyProGlyArgAlaPhe    305310315320    TyrThrThrLysAsnIleLysGlyThrIleArgGlnAlaHisCysIle    325330335    IleSerArgAlaLysTrpAsnAspThrLeuArgGlnIleValSerLys    340345350    LeuLysGluGlnPheLysAsnLysThrIleValPheAsnProSerSer    355360365    GlyGlyAspProGluIleValMetHisSerPheAsnCysGlyGlyGlu    370375380    PhePheTyrCysAsnThrSerProLeuPheAsnSerIleTrpAsnGly    385390395400    AsnAsnThrTrpAsnAsnThrThrGlySerAsnAsnAsnIleThrLeu    405410415    GlnCysLysIleLysGlnIleIleAsnMetTrpGlnLysValGlyLys    420425430    AlaMetTyrAlaProProIleGluGlyGlnIleArgCysSerSerAsn    435440445    IleThrGlyLeuLeuLeuThrArgAspGlyGlyGluAspThrAspThr    450455460    AsnAspThrGluIlePheArgProGlyGlyGlyAspMetArgAspAsn    465470475480    TrpArgSerGluLeuTyrLysTyrLysValValThrIleGluProLeu    485490495    GlyValAlaProThrLysAlaLysArgArgValValGlnArgGlu    500505510    (2) INFORMATION FOR SEQ ID NO:2:    (i) SEQUENCE CHARACTERISTICS:    (A) LENGTH: 501 amino acids    (B) TYPE: amino acid    (C) STRANDEDNESS: single    (D) TOPOLOGY: linear    (xi) SEQUENCE DESCRIPTION: SEQ ID NO:2:    LysTyrAlaLeuAlaAspAlaSerLeuLysMetAlaAspProAsnArg    151015    PheArgGlyLysAspLeuProValLeuAspGlnLeuLeuGluValPro    202530    ValTrpLysGluAlaThrThrThrLeuPheCysAlaSerAspAlaLys    354045    AlaTyrAspThrGluAlaHisAsnValTrpAlaThrHisAlaCysVal    505560    ProThrAspProAsnProGlnGluValGluLeuValAsnValThrGlu    65707580    AsnPheAsnMetTrpLysAsnAsnMetValGluGlnMetHisGluAsp    859095    IleIleSerLeuTrpAspGlnSerLeuLysProCysValLysLeuThr    100105110    ProLeuCysValThrLeuAsnCysThrAspLeuArgAsnThrThrAsn    115120125    ThrAsnAsnSerThrAspAsnAsnAsnSerLysSerGluGlyThrIle    130135140    LysGlyGlyGluMetLysAsnCysSerPheAsnIleThrThrSerIle    145150155160    GlyAspLysMetGlnLysGluTyrAlaLeuLeuTyrLysLeuAspIle    165170175    GluProIleAspAsnAspSerThrSerTyrArgLeuIleSerCysAsn    180185190    ThrSerValIleThrGlnAlaCysProLysIleSerPheGluProIle    195200205    ProIleHisTyrCysAlaProAlaGlyPheAlaIleLeuLysCysAsn    210215220    AspLysLysPheSerGlyLysGlySerCysLysAsnValSerThrVal    225230235240    GlnCysThrHisGlyIleArgProValValSerThrGlnLeuLeuLeu    245250255    AsnGlySerLeuAlaGluGluGluValValIleArgSerGluAspPhe    260265270    ThrAspAsnAlaLysThrIleIleValHisLeuLysGluSerValGln    275280285    IleAsnCysThrArgProAsnTyrAsnLysArgLysArgIleHisIle    290295300    GlyProGlyArgAlaPheTyrThrThrLysAsnIleLysGlyThrIle    305310315320    ArgGlnAlaHisCysIleIleSerArgAlaLysTrpAsnAspThrLeu    325330335    ArgGlnIleValSerLysLeuLysGluGlnPheLysAsnLysThrIle    340345350    ValPheAsnProSerSerGlyGlyAspProGluIleValMetHisSer    355360365    PheAsnCysGlyGlyGluPhePheTyrCysAsnThrSerProLeuPhe    370375380    AsnSerIleTrpAsnGlyAsnAsnThrTrpAsnAsnThrThrGlySer    385390395400    AsnAsnAsnIleThrLeuGlnCysLysIleLysGlnIleIleAsnMet    405410415    TrpGlnLysValGlyLysAlaMetTyrAlaProProIleGluGlyGln    420425430    IleArgCysSerSerAsnIleThrGlyLeuLeuLeuThrArgAspGly    435440445    GlyGluAspThrAspThrAsnAspThrGluIlePheArgProGlyGly    450455460    GlyAspMetArgAspAsnTrpArgSerGluLeuTyrLysTyrLysVal    465470475480    ValThrIleGluProLeuGlyValAlaProThrLysAlaLysArgArg    485490495    ValValGlnArgGlu    500    (2) INFORMATION FOR SEQ ID NO:3:    (i) SEQUENCE CHARACTERISTICS:    (A) LENGTH: 28 amino acids    (B) TYPE: amino acid    (C) STRANDEDNESS: single    (D) TOPOLOGY: linear    (xi) SEQUENCE DESCRIPTION: SEQ ID NO:3:    CysLysIleLysGlnIleIleAsnMetTrpGlnLysValGlyLysAla    151015    MetTyrAlaProProIleGluGlyGlnIleArgCys    2025    (2) INFORMATION FOR SEQ ID NO:4:    (i) SEQUENCE CHARACTERISTICS:    (A) LENGTH: 28 amino acids    (B) TYPE: amino acid    (C) STRANDEDNESS: single    (D) TOPOLOGY: linear    (xi) SEQUENCE DESCRIPTION: SEQ ID NO:4:    CysArgIleLysGlnPheIleAsnMetTrpGlnGluValGlyLysAla    151015    MetTyrAlaProProIleSerGlyGlnIleArgCys    2025    (2) INFORMATION FOR SEQ ID NO:5:    (i) SEQUENCE CHARACTERISTICS:    (A) LENGTH: 28 amino acids    (B) TYPE: amino acid    (C) STRANDEDNESS: single    (D) TOPOLOGY: linear    (xi) SEQUENCE DESCRIPTION: SEQ ID NO:5:    CysArgIleLysGlnIleIleAsnMetTrpGlnGluValGlyLysAla    151015    MetTyrAlaProProIleLysGlyGlnIleArgCys    2025    (2) INFORMATION FOR SEQ ID NO:6:    (i) SEQUENCE CHARACTERISTICS:    (A) LENGTH: 28 amino acids    (B) TYPE: amino acid    (C) STRANDEDNESS: single    (D) TOPOLOGY: linear    (xi) SEQUENCE DESCRIPTION: SEQ ID NO:6:    CysArgIleLysGlnIleIleAsnMetTrpGlnGlyValGlyLysAla    151015    MetTyrAlaProProIleGluGlyGlnIleAsnCys    2025    (2) INFORMATION FOR SEQ ID NO:7:    (i) SEQUENCE CHARACTERISTICS:    (A) LENGTH: 28 amino acids    (B) TYPE: amino acid    (C) STRANDEDNESS: single    (D) TOPOLOGY: linear    (xi) SEQUENCE DESCRIPTION: SEQ ID NO:7:    CysArgIleLysGlnIleIleAsnArgTrpGlnGluValGlyLysAla    151015    IleTyrAlaProProIleSerGlyGlnIleArgCys    2025    (2) INFORMATION FOR SEQ ID NO:8:    (i) SEQUENCE CHARACTERISTICS:    (A) LENGTH: 28 amino acids    (B) TYPE: amino acid    (C) STRANDEDNESS: single    (D) TOPOLOGY: linear    (xi) SEQUENCE DESCRIPTION: SEQ ID NO:8:    CysArgIleLysGlnIleValAsnMetTrpGlnArgValGlyGlnAla    151015    MetTyrAlaProProIleLysGlyValIleLysCys    2025    (2) INFORMATION FOR SEQ ID NO:9:    (i) SEQUENCE CHARACTERISTICS:    (A) LENGTH: 28 amino acids    (B) TYPE: amino acid    (C) STRANDEDNESS: single    (D) TOPOLOGY: linear    (xi) SEQUENCE DESCRIPTION: SEQ ID NO:9:    CysLysIleLysGlnIleIleAsnMetTrpGlnGlyAlaGlyGlnAla    151015    MetTyrAlaProProIleSerGlyThrIleAsnCys    2025    (2) INFORMATION FOR SEQ ID NO:10:    (i) SEQUENCE CHARACTERISTICS:    (A) LENGTH: 28 amino acids    (B) TYPE: amino acid    (C) STRANDEDNESS: single    (D) TOPOLOGY: linear    (xi) SEQUENCE DESCRIPTION: SEQ ID NO:10:    CysArgIleLysGlnPheIleAsnMetTrpGlnGluValGlyLysAla    151015    MetTyrAlaProProIleSerGlyGlnIleArgCys    2025    (2) INFORMATION FOR SEQ ID NO:11:    (i) SEQUENCE CHARACTERISTICS:    (A) LENGTH: 28 amino acids    (B) TYPE: amino acid    (C) STRANDEDNESS: single    (D) TOPOLOGY: linear    (xi) SEQUENCE DESCRIPTION: SEQ ID NO:11:    CysLysIleLysGlnIleIleAsnMetTrpGlnLysValGlyLysAla    151015    MetTyrAlaProProIleSerGlyGlnIleArgCys    2025    (2) INFORMATION FOR SEQ ID NO:12:    (i) SEQUENCE CHARACTERISTICS:    (A) LENGTH: 28 amino acids    (B) TYPE: amino acid    (C) STRANDEDNESS: single    (D) TOPOLOGY: linear    (xi) SEQUENCE DESCRIPTION: SEQ ID NO:12:    CysArgIleLysGlnIleIleAsnMetTrpGlnGluValGlyLysAla    151015    MetTyrAlaProProIleSerGlyGlnIleArgCys    2025    (2) INFORMATION FOR SEQ ID NO:13:    (i) SEQUENCE CHARACTERISTICS:    (A) LENGTH: 28 amino acids    (B) TYPE: amino acid    (C) STRANDEDNESS: single    (D) TOPOLOGY: linear    (xi) SEQUENCE DESCRIPTION: SEQ ID NO:13:    CysLysIleLysGlnIleIleAsnMetTrpGlnGluValGlyLysAla    151015    MetTyrAlaProProIleGluGlyGlnIleArgCys    2025    (2) INFORMATION FOR SEQ ID NO:14:    (i) SEQUENCE CHARACTERISTICS:    (A) LENGTH: 92 amino acids    (B) TYPE: amino acid    (C) STRANDEDNESS: single    (D) TOPOLOGY: linear    (xi) SEQUENCE DESCRIPTION: SEQ ID NO:14:    SerGlyGlyAspProGluIleValMetHisSerPheAsnCysGlyGly    151015    GluPhePheTyrCysAsnThrSerProLeuPheAsnSerIleTrpAsn    202530    GlyAsnAsnThrTrpAsnAsnThrThrGlySerAsnAsnAsnIleThr    354045    LeuGlnCysLysIleLysGlnIleIleAsnMetTrpGlnLysValGly    505560    LysAlaMetTyrAlaProProIleGluGlyGlnIleArgCysSerSer    65707580    AsnIleThrGlyLeuLeuLeuThrArgAspGlyGly    8590    (2) INFORMATION FOR SEQ ID NO:15:    (i) SEQUENCE CHARACTERISTICS:    (A) LENGTH: 28 amino acids    (B) TYPE: amino acid    (C) STRANDEDNESS: single    (D) TOPOLOGY: linear    (xi) SEQUENCE DESCRIPTION: SEQ ID NO:15:    CysLysIleLysGlnIleIleAsnMetTrpGlnGluValGlyLysAla    151015    MetTyrAlaProProIleGluGlyGlnIleArgCys    2025    (2) INFORMATION FOR SEQ ID NO:16:    (i) SEQUENCE CHARACTERISTICS:    (A) LENGTH: 28 amino acids    (B) TYPE: amino acid    (C) STRANDEDNESS: single    (D) TOPOLOGY: linear    (xi) SEQUENCE DESCRIPTION: SEQ ID NO:16:    CysLysIleLysGlnIleIleAsnMetTrpGlnAlaValGlyLysAla    151015    MetTyrAlaProProIleGluGlyGlnIleArgCys    2025    (2) INFORMATION FOR SEQ ID NO:17:    (i) SEQUENCE CHARACTERISTICS:    (A) LENGTH: 28 amino acids    (B) TYPE: amino acid    (C) STRANDEDNESS: single    (D) TOPOLOGY: linear    (xi) SEQUENCE DESCRIPTION: SEQ ID NO:17:    CysAlaIleLysGlnIleIleAsnMetTrpGlnLysValGlyLysAla    151015    MetTyrAlaProProIleGluGlyGlnIleArgCys    2025    (2) INFORMATION FOR SEQ ID NO:18:    (i) SEQUENCE CHARACTERISTICS:    (A) LENGTH: 28 amino acids    (B) TYPE: amino acid    (C) STRANDEDNESS: single    (D) TOPOLOGY: linear    (xi) SEQUENCE DESCRIPTION: SEQ ID NO:18:    CysLysIleAlaGlnIleIleAsnMetTrpGlnLysValGlyLysAla    151015    MetTyrAlaProProIleGluGlyGlnIleArgCys    2025    (2) INFORMATION FOR SEQ ID NO:19:    (i) SEQUENCE CHARACTERISTICS:    (A) LENGTH: 28 amino acids    (B) TYPE: amino acid    (C) STRANDEDNESS: single    (D) TOPOLOGY: linear    (xi) SEQUENCE DESCRIPTION: SEQ ID NO:19:    CysLysIleLysGlnIleIleAsnMetTrpGlnLysValGlyAlaAla    151015    MetTyrAlaProProIleGluGlyGlnIleArgCys    2025    (2) INFORMATION FOR SEQ ID NO:20:    (i) SEQUENCE CHARACTERISTICS:    (A) LENGTH: 28 amino acids    (B) TYPE: amino acid    (C) STRANDEDNESS: single    (D) TOPOLOGY: linear    (xi) SEQUENCE DESCRIPTION: SEQ ID NO:20:    CysLysIleLysGlnIleIleAsnMetTrpGlnLysValGlyLysAla    151015    MetTyrAlaProProIleAlaGlyGlnIleArgCys    2025    (2) INFORMATION FOR SEQ ID NO:21:    (i) SEQUENCE CHARACTERISTICS:    (A) LENGTH: 28 amino acids    (B) TYPE: amino acid    (C) STRANDEDNESS: single    (D) TOPOLOGY: linear    (xi) SEQUENCE DESCRIPTION: SEQ ID NO:21:    CysArgIleLysGlnPheIleAsnMetTrpGlnGluValGlyLysAla    151015    MetTyrAlaProProIleSerGlyGlnIleArgCys    2025    (2) INFORMATION FOR SEQ ID NO:22:    (i) SEQUENCE CHARACTERISTICS:    (A) LENGTH: 28 amino acids    (B) TYPE: amino acid    (C) STRANDEDNESS: single    (D) TOPOLOGY: linear    (xi) SEQUENCE DESCRIPTION: SEQ ID NO:22:    CysLysIleLysGlnPheIleAsnMetTrpGlnLysValGlyLysAla    151015    MetTyrAlaProProIleGluGlyGlnIleArgCys    2025    (2) INFORMATION FOR SEQ ID NO:23:    (i) SEQUENCE CHARACTERISTICS:    (A) LENGTH: 28 amino acids    (B) TYPE: amino acid    (C) STRANDEDNESS: single    (D) TOPOLOGY: linear    (xi) SEQUENCE DESCRIPTION: SEQ ID NO:23:    CysLysIleLysGlnPheIleAsnMetTrpGlnGluValGlyLysAla    151015    MetTyrAlaProProIleGluGlyGlnIleArgCys    2025    (2) INFORMATION FOR SEQ ID NO:24:    (i) SEQUENCE CHARACTERISTICS:    (A) LENGTH: 18 amino acids    (B) TYPE: amino acid    (C) STRANDEDNESS: single    (D) TOPOLOGY: linear    (xi) SEQUENCE DESCRIPTION: SEQ ID NO:24:    PheIleAsnMetTrpGlnGluValGlyLysAlaMetTyrAlaProPro    151015    IleSer    (2) INFORMATION FOR SEQ ID NO:25:    (i) SEQUENCE CHARACTERISTICS:    (A) LENGTH: 12 amino acids    (B) TYPE: amino acid    (C) STRANDEDNESS: single    (D) TOPOLOGY: linear    (xi) SEQUENCE DESCRIPTION: SEQ ID NO:25:    MetTrpGlnGluValGlyLysAlaMetTyrAlaPro    1510    (2) INFORMATION FOR SEQ ID NO:26:    (i) SEQUENCE CHARACTERISTICS:    (A) LENGTH: 14 amino acids    (B) TYPE: amino acid    (C) STRANDEDNESS: single    (D) TOPOLOGY: linear    (xi) SEQUENCE DESCRIPTION: SEQ ID NO:26:    GlyLysAlaMetTyrAlaProProIleLysGlyGlnIleArg    1510    (2) INFORMATION FOR SEQ ID NO:27:    (i) SEQUENCE CHARACTERISTICS:    (A) LENGTH: 2552 base pairs    (B) TYPE: nucleic acid    (C) STRANDEDNESS: double    (D) TOPOLOGY: linear    (ii) MOLECULE TYPE: DNA (genomic)    (ix) FEATURE:    (A) NAME/KEY: CDS    (B) LOCATION: 1..2552    (xi) SEQUENCE DESCRIPTION: SEQ ID NO:27:    ATGATAGTGAAGGGGATCAGGAAGAATTGTCAGCACTTGTGGAGATGG48    MetIleValLysGlyIleArgLysAsnCysGlnHisLeuTrpArgTrp    151015    GGCACCATGCTCCTTGGGATGTTGATGATCTGTAGTGCTGCAGAAAAA96    GlyThrMetLeuLeuGlyMetLeuMetIleCysSerAlaAlaGluLys    202530    TTGTGGGTCACAGTCTATTATGGGGTACCTGTGTGGAAAGAAGCAACC144    LeuTrpValThrValTyrTyrGlyValProValTrpLysGluAlaThr    354045    ACCACTCTATTTTGTGCATCAGATGCTAAAGCATATGATACAGAGGTA192    ThrThrLeuPheCysAlaSerAspAlaLysAlaTyrAspThrGluVal    505560    CATAATGTTTGGGCCACACATGCCTGTGTACCCACAGACCCCAACCCA240    HisAsnValTrpAlaThrHisAlaCysValProThrAspProAsnPro    65707580    CAAGAAATAGGATTGGAAAATGTAACAGAAAATTTTAACATGTGGAAA288    GlnGluIleGlyLeuGluAsnValThrGluAsnPheAsnMetTrpLys    859095    AATAACATGGTAGAACAGATGCATGAGGATATAATCAGTTTATGGGAT336    AsnAsnMetValGluGlnMetHisGluAspIleIleSerLeuTrpAsp    100105110    CAAAGCTTAAAGCCATGTGTAAAATTAACCCCACTATGTGTTACTTTA384    GlnSerLeuLysProCysValLysLeuThrProLeuCysValThrLeu    115120125    AATTGCACTGATTTGAAAAATGCTACTAATACCACTAGTAGCAGCTGG432    AsnCysThrAspLeuLysAsnAlaThrAsnThrThrSerSerSerTrp    130135140    GGAAAGATGGAGAGAGGAGAAATAAAAAACTGCTCTTTCAATGTCACC480    GlyLysMetGluArgGlyGluIleLysAsnCysSerPheAsnValThr    145150155160    ACAAGTATAAGAGATAAGATGAAGAATGAATATGCACTTTTTTATAAA528    ThrSerIleArgAspLysMetLysAsnGluTyrAlaLeuPheTyrLys    165170175    CTTGATGTAGTACCAATAGATAATGATAATACTAGCTATAGGTTGATA576    LeuAspValValProIleAspAsnAspAsnThrSerTyrArgLeuIle    180185190    AGTTGTAACACCTCAGTCATTACACAGGCCTGTCCAAAGGTGTCCTTT624    SerCysAsnThrSerValIleThrGlnAlaCysProLysValSerPhe    195200205    GAGCCAATTCCCATACATTATTGTGCCCCGGCTGGTTTTGCGATTCTA672    GluProIleProIleHisTyrCysAlaProAlaGlyPheAlaIleLeu    210215220    AAGTGTAGAGATAAAAAGTTCAACGGAACAGGACCATGTACAAATGTC720    LysCysArgAspLysLysPheAsnGlyThrGlyProCysThrAsnVal    225230235240    AGCACAGTACAATGTACACATGGAATTAGGCCAGTAGTATCAACTCAA768    SerThrValGlnCysThrHisGlyIleArgProValValSerThrGln    245250255    CTGCTGTTAAATGGCAGTTTAGCAGAAGAAGAAGTAGTAATTAGATCT816    LeuLeuLeuAsnGlySerLeuAlaGluGluGluValValIleArgSer    260265270    GCCAATTTCTCGGACAATGCTAAAACCATAATAGTACAGCTGAACGAA864    AlaAsnPheSerAspAsnAlaLysThrIleIleValGlnLeuAsnGlu    275280285    TCTGTAGAAATTAATTGTACAAGACCCAACAACAATACAAGAAGAAGT912    SerValGluIleAsnCysThrArgProAsnAsnAsnThrArgArgSer    290295300    ATACATATAGGACCAGGGAGAGCATTTTATGCAACAGGAGAAATAATA960    IleHisIleGlyProGlyArgAlaPheTyrAlaThrGlyGluIleIle    305310315320    GGAGACATAAGACAAGCACATTGTAACCTTAGTAGCACAAAATGGAAT1008    GlyAspIleArgGlnAlaHisCysAsnLeuSerSerThrLysTrpAsn    325330335    AATACTTTAAAACAGATAGTTACAAAATTAAGAGAACATTTTAATAAA1056    AsnThrLeuLysGlnIleValThrLysLeuArgGluHisPheAsnLys    340345350    ACAATAGTCTTTAATCACTCCTCAGGAGGGGACCCAGAAATTGTAATG1104    ThrIleValPheAsnHisSerSerGlyGlyAspProGluIleValMet    355360365    CACAGTTTTAATTGTGGAGGGGAATTTTTCTACTGTAATACAACACCA1152    HisSerPheAsnCysGlyGlyGluPhePheTyrCysAsnThrThrPro    370375380    CTGTTTAATAGTACTTGGAATTATACTTATACTTGGAATAATACTGAA1200    LeuPheAsnSerThrTrpAsnTyrThrTyrThrTrpAsnAsnThrGlu    385390395400    GGGTCAAATGACACTGGAAGAAATATCACACTCCAATGCAGAATAAAA1248    GlySerAsnAspThrGlyArgAsnIleThrLeuGlnCysArgIleLys    405410415    CAAATTATAAACATGTGGCAGGAAGTAGGAAAAGCAATGTATGCCCCT1296    GlnIleIleAsnMetTrpGlnGluValGlyLysAlaMetTyrAlaPro    420425430    CCCATAAGAGGACAAATTAGATGCTCATCAAATATTACAGGGCTGCTA1344    ProIleArgGlyGlnIleArgCysSerSerAsnIleThrGlyLeuLeu    435440445    TTAACAAGAGATGGTGGTAATAACAGCGAAACCGAGATCTTCAGACCT1392    LeuThrArgAspGlyGlyAsnAsnSerGluThrGluIlePheArgPro    450455460    GGAGGAGGAGATATGAGGGACAATTGGAGAAGTGAATTATATAAATAT1440    GlyGlyGlyAspMetArgAspAsnTrpArgSerGluLeuTyrLysTyr    465470475480    AAAGTAGTAAAAATTGAACCATTAGGAGTAGCACCCACCAAGGCAAAG1488    LysValValLysIleGluProLeuGlyValAlaProThrLysAlaLys    485490495    AGAAGAGTGATGCAGAGAGAAAAAAGAGCAGTGGGAATAGGAGCTGTG1536    ArgArgValMetGlnArgGluLysArgAlaValGlyIleGlyAlaVal    500505510    TTCCTTGGGTTCTTGGGAGCAGCAGGAAGCACTATGGGCGCAGCGTCA1584    PheLeuGlyPheLeuGlyAlaAlaGlySerThrMetGlyAlaAlaSer    515520525    GTGACGCTGACGGTACAGGCCAGACTATTATTGTCTGGTATAGTGCAA1632    ValThrLeuThrValGlnAlaArgLeuLeuLeuSerGlyIleValGln    530535540    CAGCAGAACAATTTGCTGAGGGCTATTGAGGCCGAACAGCATCTGTTG1680    GlnGlnAsnAsnLeuLeuArgAlaIleGluAlaGluGlnHisLeuLeu    545550555560    CAACTCACAGTCTGGGGCATCAAGCAGCTCCAGGCAAGAGTCCTGGCT1728    GlnLeuThrValTrpGlyIleLysGlnLeuGlnAlaArgValLeuAla    565570575    GTGGAGAGATACCTAAAGGATCAACAGCTCCTGGGGATTTGGGGTTGC1776    ValGluArgTyrLeuLysAspGlnGlnLeuLeuGlyIleTrpGlyCys    580585590    TCTGGAAAACTCATCTGCACCACTGCTGTGCCTTGGAATGCTAGTTGG1824    SerGlyLysLeuIleCysThrThrAlaValProTrpAsnAlaSerTrp    595600605    AGTAATAAATCTCTGGATAAGATTTGGGATAACATGACCTGGATGGAG1872    SerAsnLysSerLeuAspLysIleTrpAspAsnMetThrTrpMetGlu    610615620    TGGGAAAGAGAAATTGACAATTACACAAGCTTAATATACAGCTTAATT1920    TrpGluArgGluIleAspAsnTyrThrSerLeuIleTyrSerLeuIle    625630635640    GAAGAATCGCAGAACCAACAAGAAAAAAATGAACAAGAATTATTGGAA1968    GluGluSerGlnAsnGlnGlnGluLysAsnGluGlnGluLeuLeuGlu    645650655    TTAGATAAATGGGCAAGTTTGTGGAATTGGTTTGACATAACAAAATGG2016    LeuAspLysTrpAlaSerLeuTrpAsnTrpPheAspIleThrLysTrp    660665670    CTGTGGTATATAAAAATATTCATAATGATAGTAGGAGGCTTGGTAGGT2064    LeuTrpTyrIleLysIlePheIleMetIleValGlyGlyLeuValGly    675680685    TTAAGAATAGTTTTTACTGTACTTTCTATAGTGAATAGAGTTAGGAAG2112    LeuArgIleValPheThrValLeuSerIleValAsnArgValArgLys    690695700    GGATACTCACCATTATCGTTCCAGACCCACCTCCCAGCCCCGAGGGGA2160    GlyTyrSerProLeuSerPheGlnThrHisLeuProAlaProArgGly    705710715720    CTCGACAGGCCCGAAGGAACCGAAGAAGAAGGTGGAGAGCGAGACAGA2208    LeuAspArgProGluGlyThrGluGluGluGlyGlyGluArgAspArg    725730735    GACAGATCCAGTCGATTAGTGGATGGATTCTTAGCAATTGTCTGGGTC2256    AspArgSerSerArgLeuValAspGlyPheLeuAlaIleValTrpVal    740745750    GACCTGCGGAGCCTGTGCCTCTTCAGCTACCACCGCTTGAGAGACTTA2304    AspLeuArgSerLeuCysLeuPheSerTyrHisArgLeuArgAspLeu    755760765    CTCTTGATTGCAGCGAGGATTGTGGAACTTCTGGGACGCAGGGGGTGG2352    LeuLeuIleAlaAlaArgIleValGluLeuLeuGlyArgArgGlyTrp    770775780    GAAGCCCTCAAATATTGGTGGAATCTCCTACAGTATTGGATTCAGGAA2400    GluAlaLeuLysTyrTrpTrpAsnLeuLeuGlnTyrTrpIleGlnGlu    785790795800    CTAAAGAATAGTGCTGTTAGCTTGCTCAATGCCACAGCCATAGCAGTA2448    LeuLysAsnSerAlaValSerLeuLeuAsnAlaThrAlaIleAlaVal    805810815    GCTGAGGGAACAGATAGGGTTATAGAAATAGTACAAAGAGCTTATAGA2496    AlaGluGlyThrAspArgValIleGluIleValGlnArgAlaTyrArg    820825830    GCTATTCTCCACATACCCACACGAATAAGACAGGGCTTGGAAAGGGCT2544    AlaIleLeuHisIleProThrArgIleArgGlnGlyLeuGluArgAla    835840845    TTGCTATA2552    LeuLeu    850    (2) INFORMATION FOR SEQ ID NO:28:    (i) SEQUENCE CHARACTERISTICS:    (A) LENGTH: 850 amino acids    (B) TYPE: amino acid    (C) STRANDEDNESS: single    (D) TOPOLOGY: linear    (xi) SEQUENCE DESCRIPTION: SEQ ID NO:28:    MetIleValLysGlyIleArgLysAsnCysGlnHisLeuTrpArgTrp    151015    GlyThrMetLeuLeuGlyMetLeuMetIleCysSerAlaAlaGluLys    202530    LeuTrpValThrValTyrTyrGlyValProValTrpLysGluAlaThr    354045    ThrThrLeuPheCysAlaSerAspAlaLysAlaTyrAspThrGluVal    505560    HisAsnValTrpAlaThrHisAlaCysValProThrAspProAsnPro    65707580    GlnGluIleGlyLeuGluAsnValThrGluAsnPheAsnMetTrpLys    859095    AsnAsnMetValGluGlnMetHisGluAspIleIleSerLeuTrpAsp    100105110    GlnSerLeuLysProCysValLysLeuThrProLeuCysValThrLeu    115120125    AsnCysThrAspLeuLysAsnAlaThrAsnThrThrSerSerSerTrp    130135140    GlyLysMetGluArgGlyGluIleLysAsnCysSerPheAsnValThr    145150155160    ThrSerIleArgAspLysMetLysAsnGluTyrAlaLeuPheTyrLys    165170175    LeuAspValValProIleAspAsnAspAsnThrSerTyrArgLeuIle    180185190    SerCysAsnThrSerValIleThrGlnAlaCysProLysValSerPhe    195200205    GluProIleProIleHisTyrCysAlaProAlaGlyPheAlaIleLeu    210215220    LysCysArgAspLysLysPheAsnGlyThrGlyProCysThrAsnVal    225230235240    SerThrValGlnCysThrHisGlyIleArgProValValSerThrGln    245250255    LeuLeuLeuAsnGlySerLeuAlaGluGluGluValValIleArgSer    260265270    AlaAsnPheSerAspAsnAlaLysThrIleIleValGlnLeuAsnGlu    275280285    SerValGluIleAsnCysThrArgProAsnAsnAsnThrArgArgSer    290295300    IleHisIleGlyProGlyArgAlaPheTyrAlaThrGlyGluIleIle    305310315320    GlyAspIleArgGlnAlaHisCysAsnLeuSerSerThrLysTrpAsn    325330335    AsnThrLeuLysGlnIleValThrLysLeuArgGluHisPheAsnLys    340345350    ThrIleValPheAsnHisSerSerGlyGlyAspProGluIleValMet    355360365    HisSerPheAsnCysGlyGlyGluPhePheTyrCysAsnThrThrPro    370375380    LeuPheAsnSerThrTrpAsnTyrThrTyrThrTrpAsnAsnThrGlu    385390395400    GlySerAsnAspThrGlyArgAsnIleThrLeuGlnCysArgIleLys    405410415    GlnIleIleAsnMetTrpGlnGluValGlyLysAlaMetTyrAlaPro    420425430    ProIleArgGlyGlnIleArgCysSerSerAsnIleThrGlyLeuLeu    435440445    LeuThrArgAspGlyGlyAsnAsnSerGluThrGluIlePheArgPro    450455460    GlyGlyGlyAspMetArgAspAsnTrpArgSerGluLeuTyrLysTyr    465470475480    LysValValLysIleGluProLeuGlyValAlaProThrLysAlaLys    485490495    ArgArgValMetGlnArgGluLysArgAlaValGlyIleGlyAlaVal    500505510    PheLeuGlyPheLeuGlyAlaAlaGlySerThrMetGlyAlaAlaSer    515520525    ValThrLeuThrValGlnAlaArgLeuLeuLeuSerGlyIleValGln    530535540    GlnGlnAsnAsnLeuLeuArgAlaIleGluAlaGluGlnHisLeuLeu    545550555560    GlnLeuThrValTrpGlyIleLysGlnLeuGlnAlaArgValLeuAla    565570575    ValGluArgTyrLeuLysAspGlnGlnLeuLeuGlyIleTrpGlyCys    580585590    SerGlyLysLeuIleCysThrThrAlaValProTrpAsnAlaSerTrp    595600605    SerAsnLysSerLeuAspLysIleTrpAspAsnMetThrTrpMetGlu    610615620    TrpGluArgGluIleAspAsnTyrThrSerLeuIleTyrSerLeuIle    625630635640    GluGluSerGlnAsnGlnGlnGluLysAsnGluGlnGluLeuLeuGlu    645650655    LeuAspLysTrpAlaSerLeuTrpAsnTrpPheAspIleThrLysTrp    660665670    LeuTrpTyrIleLysIlePheIleMetIleValGlyGlyLeuValGly    675680685    LeuArgIleValPheThrValLeuSerIleValAsnArgValArgLys    690695700    GlyTyrSerProLeuSerPheGlnThrHisLeuProAlaProArgGly    705710715720    LeuAspArgProGluGlyThrGluGluGluGlyGlyGluArgAspArg    725730735    AspArgSerSerArgLeuValAspGlyPheLeuAlaIleValTrpVal    740745750    AspLeuArgSerLeuCysLeuPheSerTyrHisArgLeuArgAspLeu    755760765    LeuLeuIleAlaAlaArgIleValGluLeuLeuGlyArgArgGlyTrp    770775780    GluAlaLeuLysTyrTrpTrpAsnLeuLeuGlnTyrTrpIleGlnGlu    785790795800    LeuLysAsnSerAlaValSerLeuLeuAsnAlaThrAlaIleAlaVal    805810815    AlaGluGlyThrAspArgValIleGluIleValGlnArgAlaTyrArg    820825830    AlaIleLeuHisIleProThrArgIleArgGlnGlyLeuGluArgAla    835840845    LeuLeu    850    (2) INFORMATION FOR SEQ ID NO:29:    (i) SEQUENCE CHARACTERISTICS:    (A) LENGTH: 2573 base pairs    (B) TYPE: nucleic acid    (C) STRANDEDNESS: double    (D) TOPOLOGY: linear    (ii) MOLECULE TYPE: DNA (genomic)    (ix) FEATURE:    (A) NAME/KEY: CDS    (B) LOCATION: 1..2573    (xi) SEQUENCE DESCRIPTION: SEQ ID NO:29:    ATGAGAGTGAAGGGGATCAGGAGGAATTATCAGCACTTGTGGAGATGG48    MetArgValLysGlyIleArgArgAsnTyrGlnHisLeuTrpArgTrp    151015    GGCACCATGCTCCTTGGGATATTGATGATCTGTAGTGCTGCAGGGAAA96    GlyThrMetLeuLeuGlyIleLeuMetIleCysSerAlaAlaGlyLys    202530    TTGTGGGTCACAGTCTATTATGGGGTACCTGTGTGGAAAGAAACAACC144    LeuTrpValThrValTyrTyrGlyValProValTrpLysGluThrThr    354045    ACCACTCTATTTTGTGCATCAGATGCTAAAGCATATGATACAGAGATA192    ThrThrLeuPheCysAlaSerAspAlaLysAlaTyrAspThrGluIle    505560    CATAATGTTTGGGCCACACATGCCTGTGTACCCACAGACCCCAACCCA240    HisAsnValTrpAlaThrHisAlaCysValProThrAspProAsnPro    65707580    CAAGAAGTAGTATTGGAAAATGTGACAGAAAATTTTAACATGTGGAAA288    GlnGluValValLeuGluAsnValThrGluAsnPheAsnMetTrpLys    859095    AATAACATGGTGGAACAGATGCATGAGGATATAATCAGTTTATGGGAT336    AsnAsnMetValGluGlnMetHisGluAspIleIleSerLeuTrpAsp    100105110    CAAAGTTTAAAGCCATGTGTAAAATTAACCCCACTCTGTGTTACTTTA384    GlnSerLeuLysProCysValLysLeuThrProLeuCysValThrLeu    115120125    AATTGCACTGATGCGGGGAATACTACTAATACCAATAGTAGTAGCAGG432    AsnCysThrAspAlaGlyAsnThrThrAsnThrAsnSerSerSerArg    130135140    GAAAAGCTGGAGAAAGGAGAAATAAAAAACTGCTCTTTCAATATCACC480    GluLysLeuGluLysGlyGluIleLysAsnCysSerPheAsnIleThr    145150155160    ACAAGCGTGAGAGATAAGATGCAGAAAGAAACTGCACTTTTTAATAAA528    ThrSerValArgAspLysMetGlnLysGluThrAlaLeuPheAsnLys    165170175    CTTGATATAGTACCAATAGATGATGATGATAGGAATAGTACTAGGAAT576    LeuAspIleValProIleAspAspAspAspArgAsnSerThrArgAsn    180185190    AGTACTAACTATAGGTTGATAAGTTGTAACACCTCAGTCATTACACAG624    SerThrAsnTyrArgLeuIleSerCysAsnThrSerValIleThrGln    195200205    GCCTGTCCAAAGGTATCATTTGAGCCAATTCCCATACATTTCTGTACC672    AlaCysProLysValSerPheGluProIleProIleHisPheCysThr    210215220    CCGGCTGGTTTTGCGCTTCTAAAGTGTAATAATAAGACGTTCAATGGA720    ProAlaGlyPheAlaLeuLeuLysCysAsnAsnLysThrPheAsnGly    225230235240    TCAGGACCATGCAAAAATGTCAGCACAGTACAATGTACACATGGAATT768    SerGlyProCysLysAsnValSerThrValGlnCysThrHisGlyIle    245250255    AGGCCAGTAGTATCAACTCAACTGCTGTTAAATGGCAGTCTAGCAGAA816    ArgProValValSerThrGlnLeuLeuLeuAsnGlySerLeuAlaGlu    260265270    GGAGAGGTAGTAATTAGATCTGAAAATTTCACGAACAATGCTAAAACC864    GlyGluValValIleArgSerGluAsnPheThrAsnAsnAlaLysThr    275280285    ATAATAGTACAGCTGACAGAACCAGTAAAAATTAATTGTACAAGACCC912    IleIleValGlnLeuThrGluProValLysIleAsnCysThrArgPro    290295300    AACAACAATACAAGAAAAAGTATACCTATAGGACCAGGGAGAGCATTT960    AsnAsnAsnThrArgLysSerIleProIleGlyProGlyArgAlaPhe    305310315320    TATGCAACAGGAGACATAATAGGAAATATAAGACAAGCACATTGTAAC1008    TyrAlaThrGlyAspIleIleGlyAsnIleArgGlnAlaHisCysAsn    325330335    CTTAGTAGAACAGACTGGAATAACACTTTAGGACAGATAGTTGAAAAA1056    LeuSerArgThrAspTrpAsnAsnThrLeuGlyGlnIleValGluLys    340345350    TTAAGAGAACAATTTGGGAATAAAACAATAATCTTTAATCACTCCTCA1104    LeuArgGluGlnPheGlyAsnLysThrIleIlePheAsnHisSerSer    355360365    GGAGGGGACCCAGAAATTGTAATGCACAGTTTTAATTGTAGAGGGGAA1152    GlyGlyAspProGluIleValMetHisSerPheAsnCysArgGlyGlu    370375380    TTTTTCTACTGTAATACAACACAATTGTTTGACAGTACTTGGGATAAT1200    PhePheTyrCysAsnThrThrGlnLeuPheAspSerThrTrpAspAsn    385390395400    ACTAAAGTGTCAAATGGCACTAGCACTGAAGAGAATAGCACAATCACA1248    ThrLysValSerAsnGlyThrSerThrGluGluAsnSerThrIleThr    405410415    CTCCCATGCAGAATAAAGCAAATTGTAAACATGTGGCAGGAAGTAGGA1296    LeuProCysArgIleLysGlnIleValAsnMetTrpGlnGluValGly    420425430    AAAGCAATGTATGCCCCTCCCATCAGAGGACAAATTAGATGTTCATCA1344    LysAlaMetTyrAlaProProIleArgGlyGlnIleArgCysSerSer    435440445    AATATTACAGGGTTGCTATTAACAAGAGATGGAGGTAGTAACAACAGC1392    AsnIleThrGlyLeuLeuLeuThrArgAspGlyGlySerAsnAsnSer    450455460    ATGAATGAGACCTTCAGACCTGGAGGAGGAGATATGAGGGACAATTGG1440    MetAsnGluThrPheArgProGlyGlyGlyAspMetArgAspAsnTrp    465470475480    AGAAGTGAATTATACAAATATAAAGTAGTAAAAATTGAACCATTAGGA1488    ArgSerGluLeuTyrLysTyrLysValValLysIleGluProLeuGly    485490495    GTAGCACCCACCAAGGCAAAGAGAAGAGTGGTGCAGAGAGAAAAAAGA1536    ValAlaProThrLysAlaLysArgArgValValGlnArgGluLysArg    500505510    GCAGTGGGAATAGGAGCTGTGTTCCTTGGGTTCTTAGGAGCAGCAGGA1584    AlaValGlyIleGlyAlaValPheLeuGlyPheLeuGlyAlaAlaGly    515520525    AGCACTATGGGCGCAGCGTCAATAACGCTGACGGTACAGGCCAGACTA1632    SerThrMetGlyAlaAlaSerIleThrLeuThrValGlnAlaArgLeu    530535540    TTATTGTCTGGTATAGTGCAACAGCAGAACAATTTGCTGAGGGCTATT1680    LeuLeuSerGlyIleValGlnGlnGlnAsnAsnLeuLeuArgAlaIle    545550555560    GAGGCGCAACAGCATCTGTTGCAACTCATAGTCTGGGGCATCAAGCAG1728    GluAlaGlnGlnHisLeuLeuGlnLeuIleValTrpGlyIleLysGln    565570575    CTCCAGGCAAGAGTCCTGGCTGTGGAAAGATACCTAAGGGATCAACAG1776    LeuGlnAlaArgValLeuAlaValGluArgTyrLeuArgAspGlnGln    580585590    CTCCTGGGGATTTGGGGTTGCTCTGGAAAACTCATTTGCACCACCTCA1824    LeuLeuGlyIleTrpGlyCysSerGlyLysLeuIleCysThrThrSer    595600605    GTGCCTTGGAATGCTAGTTGGAGTAATAAATCTCTAGATAAGATTTGG1872    ValProTrpAsnAlaSerTrpSerAsnLysSerLeuAspLysIleTrp    610615620    GATAACATGACCTGGATGGAGTGGGAAAGAGAAATTGAGAATTACACA1920    AspAsnMetThrTrpMetGluTrpGluArgGluIleGluAsnTyrThr    625630635640    AGCTTAATATACACCTTAATTGAAGAATCGCAGAACCAACAAGAAAAG1968    SerLeuIleTyrThrLeuIleGluGluSerGlnAsnGlnGlnGluLys    645650655    AATGAACAAGACTTATTGGAATTGGATCAATGGGCAAGTCTGTGGAAT2016    AsnGluGlnAspLeuLeuGluLeuAspGlnTrpAlaSerLeuTrpAsn    660665670    TGGTTTAGCATAACAAAATGGCTGTGGTATATAAAAATATTCATAATG2064    TrpPheSerIleThrLysTrpLeuTrpTyrIleLysIlePheIleMet    675680685    ATAGTTGGAGGCTTGGTAGGTTTAAGAATAGTTTTTGCTGTACTTTCT2112    IleValGlyGlyLeuValGlyLeuArgIleValPheAlaValLeuSer    690695700    ATAGTGAATAGAGTTAGGCAGGGATACTCACCATTATCGTTTCAGACC2160    IleValAsnArgValArgGlnGlyTyrSerProLeuSerPheGlnThr    705710715720    CGCCTCCCAGCCCCGAGGAGACCCGACAGGCCCGAAGGAATCGAAGAA2208    ArgLeuProAlaProArgArgProAspArgProGluGlyIleGluGlu    725730735    GAAGGTGGAGAGCAAGGCAGAGACAGATCCATTCGCTTAGTGGATGGA2256    GluGlyGlyGluGlnGlyArgAspArgSerIleArgLeuValAspGly    740745750    TTCTTAGCACTTATCTGGGACGACCTACGGAGCCTGTGCCTCTTCAGC2304    PheLeuAlaLeuIleTrpAspAspLeuArgSerLeuCysLeuPheSer    755760765    TACCACCGCTTGAGAGACTTACTCTTGATTGCAACGAGGATTGTGGAA2352    TyrHisArgLeuArgAspLeuLeuLeuIleAlaThrArgIleValGlu    770775780    CTTCTGGGACGCAGGGGGTGGGAAGCCCTCAAATATTGGTGGAATCTC2400    LeuLeuGlyArgArgGlyTrpGluAlaLeuLysTyrTrpTrpAsnLeu    785790795800    CTACAGTATTGGATTCAGGAACTAAAGAATAGTGCTGTTAGCTTGCTT2448    LeuGlnTyrTrpIleGlnGluLeuLysAsnSerAlaValSerLeuLeu    805810815    AATGTCACAGCCATAGCAGTAGCTGAGGGGACAGATAGGGTTTTAGAA2496    AsnValThrAlaIleAlaValAlaGluGlyThrAspArgValLeuGlu    820825830    GTATTACAAAGAGCTTATAGAGCTATTCTCCACATACCTACAAGAATA2544    ValLeuGlnArgAlaTyrArgAlaIleLeuHisIleProThrArgIle    835840845    AGACAGGGCTTGGAAAGGGCTTTGCTATA2573    ArgGlnGlyLeuGluArgAlaLeuLeu    850855    (2) INFORMATION FOR SEQ ID NO:30:    (i) SEQUENCE CHARACTERISTICS:    (A) LENGTH: 857 amino acids    (B) TYPE: amino acid    (C) STRANDEDNESS: single    (D) TOPOLOGY: linear    (xi) SEQUENCE DESCRIPTION: SEQ ID NO:30:    MetArgValLysGlyIleArgArgAsnTyrGlnHisLeuTrpArgTrp    151015    GlyThrMetLeuLeuGlyIleLeuMetIleCysSerAlaAlaGlyLys    202530    LeuTrpValThrValTyrTyrGlyValProValTrpLysGluThrThr    354045    ThrThrLeuPheCysAlaSerAspAlaLysAlaTyrAspThrGluIle    505560    HisAsnValTrpAlaThrHisAlaCysValProThrAspProAsnPro    65707580    GlnGluValValLeuGluAsnValThrGluAsnPheAsnMetTrpLys    859095    AsnAsnMetValGluGlnMetHisGluAspIleIleSerLeuTrpAsp    100105110    GlnSerLeuLysProCysValLysLeuThrProLeuCysValThrLeu    115120125    AsnCysThrAspAlaGlyAsnThrThrAsnThrAsnSerSerSerArg    130135140    GluLysLeuGluLysGlyGluIleLysAsnCysSerPheAsnIleThr    145150155160    ThrSerValArgAspLysMetGlnLysGluThrAlaLeuPheAsnLys    165170175    LeuAspIleValProIleAspAspAspAspArgAsnSerThrArgAsn    180185190    SerThrAsnTyrArgLeuIleSerCysAsnThrSerValIleThrGln    195200205    AlaCysProLysValSerPheGluProIleProIleHisPheCysThr    210215220    ProAlaGlyPheAlaLeuLeuLysCysAsnAsnLysThrPheAsnGly    225230235240    SerGlyProCysLysAsnValSerThrValGlnCysThrHisGlyIle    245250255    ArgProValValSerThrGlnLeuLeuLeuAsnGlySerLeuAlaGlu    260265270    GlyGluValValIleArgSerGluAsnPheThrAsnAsnAlaLysThr    275280285    IleIleValGlnLeuThrGluProValLysIleAsnCysThrArgPro    290295300    AsnAsnAsnThrArgLysSerIleProIleGlyProGlyArgAlaPhe    305310315320    TyrAlaThrGlyAspIleIleGlyAsnIleArgGlnAlaHisCysAsn    325330335    LeuSerArgThrAspTrpAsnAsnThrLeuGlyGlnIleValGluLys    340345350    LeuArgGluGlnPheGlyAsnLysThrIleIlePheAsnHisSerSer    355360365    GlyGlyAspProGluIleValMetHisSerPheAsnCysArgGlyGlu    370375380    PhePheTyrCysAsnThrThrGlnLeuPheAspSerThrTrpAspAsn    385390395400    ThrLysValSerAsnGlyThrSerThrGluGluAsnSerThrIleThr    405410415    LeuProCysArgIleLysGlnIleValAsnMetTrpGlnGluValGly    420425430    LysAlaMetTyrAlaProProIleArgGlyGlnIleArgCysSerSer    435440445    AsnIleThrGlyLeuLeuLeuThrArgAspGlyGlySerAsnAsnSer    450455460    MetAsnGluThrPheArgProGlyGlyGlyAspMetArgAspAsnTrp    465470475480    ArgSerGluLeuTyrLysTyrLysValValLysIleGluProLeuGly    485490495    ValAlaProThrLysAlaLysArgArgValValGlnArgGluLysArg    500505510    AlaValGlyIleGlyAlaValPheLeuGlyPheLeuGlyAlaAlaGly    515520525    SerThrMetGlyAlaAlaSerIleThrLeuThrValGlnAlaArgLeu    530535540    LeuLeuSerGlyIleValGlnGlnGlnAsnAsnLeuLeuArgAlaIle    545550555560    GluAlaGlnGlnHisLeuLeuGlnLeuIleValTrpGlyIleLysGln    565570575    LeuGlnAlaArgValLeuAlaValGluArgTyrLeuArgAspGlnGln    580585590    LeuLeuGlyIleTrpGlyCysSerGlyLysLeuIleCysThrThrSer    595600605    ValProTrpAsnAlaSerTrpSerAsnLysSerLeuAspLysIleTrp    610615620    AspAsnMetThrTrpMetGluTrpGluArgGluIleGluAsnTyrThr    625630635640    SerLeuIleTyrThrLeuIleGluGluSerGlnAsnGlnGlnGluLys    645650655    AsnGluGlnAspLeuLeuGluLeuAspGlnTrpAlaSerLeuTrpAsn    660665670    TrpPheSerIleThrLysTrpLeuTrpTyrIleLysIlePheIleMet    675680685    IleValGlyGlyLeuValGlyLeuArgIleValPheAlaValLeuSer    690695700    IleValAsnArgValArgGlnGlyTyrSerProLeuSerPheGlnThr    705710715720    ArgLeuProAlaProArgArgProAspArgProGluGlyIleGluGlu    725730735    GluGlyGlyGluGlnGlyArgAspArgSerIleArgLeuValAspGly    740745750    PheLeuAlaLeuIleTrpAspAspLeuArgSerLeuCysLeuPheSer    755760765    TyrHisArgLeuArgAspLeuLeuLeuIleAlaThrArgIleValGlu    770775780    LeuLeuGlyArgArgGlyTrpGluAlaLeuLysTyrTrpTrpAsnLeu    785790795800    LeuGlnTyrTrpIleGlnGluLeuLysAsnSerAlaValSerLeuLeu    805810815    AsnValThrAlaIleAlaValAlaGluGlyThrAspArgValLeuGlu    820825830    ValLeuGlnArgAlaTyrArgAlaIleLeuHisIleProThrArgIle    835840845    ArgGlnGlyLeuGluArgAlaLeuLeu    850855    (2) INFORMATION FOR SEQ ID NO:31:    (i) SEQUENCE CHARACTERISTICS:    (A) LENGTH: 2570 base pairs    (B) TYPE: nucleic acid    (C) STRANDEDNESS: double    (D) TOPOLOGY: linear    (ii) MOLECULE TYPE: DNA (genomic)    (ix) FEATURE:    (A) NAME/KEY: CDS    (B) LOCATION: 1..2570    (xi) SEQUENCE DESCRIPTION: SEQ ID NO:31:    ATGAGAGTGAAGAGGATCAGGAGGAATTATCAGCACTTGTGGAAATGG48    MetArgValLysArgIleArgArgAsnTyrGlnHisLeuTrpLysTrp    151015    GGCACCATGCTCCTTGGGATGTTGATGATCTGTAGTGCTGCAGGAAAA96    GlyThrMetLeuLeuGlyMetLeuMetIleCysSerAlaAlaGlyLys    202530    TTGTGGGTCACAGTCTATTATGGGGTACCTGTGTGGAAAGAAACAACC144    LeuTrpValThrValTyrTyrGlyValProValTrpLysGluThrThr    354045    ACCACTCTATTTTGTGCATCAGATGCTAAAGCATATGATACAGAGATA192    ThrThrLeuPheCysAlaSerAspAlaLysAlaTyrAspThrGluIle    505560    CATAATGTTTGGGCCACACATGCCTGTGTACCCACAGACCCCAACCCA240    HisAsnValTrpAlaThrHisAlaCysValProThrAspProAsnPro    65707580    CAAGAAGTAGTATTGGAAAATGTGACAGAAAATTTTAACATGTGGAAA288    GlnGluValValLeuGluAsnValThrGluAsnPheAsnMetTrpLys    859095    AATAACATGGTGGAACAGATGCATGAGGATATAATCAGTTTATGGGAT336    AsnAsnMetValGluGlnMetHisGluAspIleIleSerLeuTrpAsp    100105110    CAAAGTCTAAAGCCATGTGTAAAATTAACCCCACTCTGTGTTACTTTA384    GlnSerLeuLysProCysValLysLeuThrProLeuCysValThrLeu    115120125    AATTGCACTGATGCGGGGAATACTACTAATACCAATAGTAGTAGCGGG432    AsnCysThrAspAlaGlyAsnThrThrAsnThrAsnSerSerSerGly    130135140    GAAAAGCTGGAGAAAGGAGAAATAAAAAACTGCTCTTTCAATATCACC480    GluLysLeuGluLysGlyGluIleLysAsnCysSerPheAsnIleThr    145150155160    ACAAGCATGAGAGATAAGATGCAGAGAGAAACTGCACTTTTTAATAAA528    ThrSerMetArgAspLysMetGlnArgGluThrAlaLeuPheAsnLys    165170175    CTTGATATAGTACCAATAGATGATGATGATAGGAATAGTACTAGGAAT576    LeuAspIleValProIleAspAspAspAspArgAsnSerThrArgAsn    180185190    AGTACTAACTATAGGTTGATAAGTTGTAACACCTCAGTCATTACACAG624    SerThrAsnTyrArgLeuIleSerCysAsnThrSerValIleThrGln    195200205    GCCTGTCCAAAGGTATCATTTGAGCCAATTCCCATACATTTCTGTACC672    AlaCysProLysValSerPheGluProIleProIleHisPheCysThr    210215220    CCGGCTGGTTTTGCGCTTCTAAAGTGTAATAATGAGACGTTCAATGGA720    ProAlaGlyPheAlaLeuLeuLysCysAsnAsnGluThrPheAsnGly    225230235240    TCAGGACCATGCAAAAATGTCAGCACAGTACTATGTACACATGGAATT768    SerGlyProCysLysAsnValSerThrValLeuCysThrHisGlyIle    245250255    AGGCCAGTAGTATCAACTCAACTGCTGTTAAATGGCAGTCTAGCAGGA816    ArgProValValSerThrGlnLeuLeuLeuAsnGlySerLeuAlaGly    260265270    GAAGAGGTAGTAATTAGATCTGAAAATTTCACGAACAATGCTAAAACC864    GluGluValValIleArgSerGluAsnPheThrAsnAsnAlaLysThr    275280285    ATAATAGTACAGCTCAAAGAACCAGTAAAAATTAATTGTACAAGACCC912    IleIleValGlnLeuLysGluProValLysIleAsnCysThrArgPro    290295300    AACAACAATACAAGAAAAAGTATACCTATAGGACCAGGGAGAGCATTT960    AsnAsnAsnThrArgLysSerIleProIleGlyProGlyArgAlaPhe    305310315320    TATGCAACAGGCGACATAATAGGAAATATAAGACAAGCACATTGTAAC1008    TyrAlaThrGlyAspIleIleGlyAsnIleArgGlnAlaHisCysAsn    325330335    CTTAGTAGAACAGACTGGAATAACACTTTAAGACAGATAGCTGAAAAA1056    LeuSerArgThrAspTrpAsnAsnThrLeuArgGlnIleAlaGluLys    340345350    TTAAGAAAACAATTTGGGAATAAAACAATAATCTTTAATCACTCCTCA1104    LeuArgLysGlnPheGlyAsnLysThrIleIlePheAsnHisSerSer    355360365    GGAGGGGACCCAGAAATTGTAATGCACAGTTTTAATTGTAGAGGGGAA1152    GlyGlyAspProGluIleValMetHisSerPheAsnCysArgGlyGlu    370375380    TTTTTCTACTGTGATACAACACAATTGTTTAACAGTACTTGGAATGCA1200    PhePheTyrCysAspThrThrGlnLeuPheAsnSerThrTrpAsnAla    385390395400    AATAACACTGAAAGGAATAGCACTAAAGAGAATAGCACAATCACACTC1248    AsnAsnThrGluArgAsnSerThrLysGluAsnSerThrIleThrLeu    405410415    CCATGCAGAATAAAACAAATTGTAAACATGTGGCAGGAAGTAGGAAAA1296    ProCysArgIleLysGlnIleValAsnMetTrpGlnGluValGlyLys    420425430    GCAATGTATGCCCCTCCCATCAGAGGACAAATTAGATGTTCATCAAAT1344    AlaMetTyrAlaProProIleArgGlyGlnIleArgCysSerSerAsn    435440445    ATTACAGGGTTGCTATTAACAAGAGATGGAGGTAGTAGCAACAGCATG1392    IleThrGlyLeuLeuLeuThrArgAspGlyGlySerSerAsnSerMet    450455460    AATGAGACCTTCAGACCTGGAGGAGGAGATATGAGGGACAATTGGAGA1440    AsnGluThrPheArgProGlyGlyGlyAspMetArgAspAsnTrpArg    465470475480    AGTGAATTATACAAATATAAAGTAGTAAAAATTGAACCATTAGGAGTA1488    SerGluLeuTyrLysTyrLysValValLysIleGluProLeuGlyVal    485490495    GCACCCACCAAGGCAATGAGAAGAGTGGTGCAGAGAGAAAAAAGAGCA1536    AlaProThrLysAlaMetArgArgValValGlnArgGluLysArgAla    500505510    GTGGGAATAGGAGCTGTGTTCCTTGGGTTCTTAGGAGCAGCAGGAAGC1584    ValGlyIleGlyAlaValPheLeuGlyPheLeuGlyAlaAlaGlySer    515520525    ACTATGGGCGCAGCGTCAATAACGCTGACGGTACAGGCCAGACTATTA1632    ThrMetGlyAlaAlaSerIleThrLeuThrValGlnAlaArgLeuLeu    530535540    TTGTCTGGTATAGTGCAACAGCAGAACAATTTGCTGAGGGCTATTGAG1680    LeuSerGlyIleValGlnGlnGlnAsnAsnLeuLeuArgAlaIleGlu    545550555560    GCGCAACAGCATCTGTTGCAACTCACAGTCTGGGGCATCAAGCAGCTC1728    AlaGlnGlnHisLeuLeuGlnLeuThrValTrpGlyIleLysGlnLeu    565570575    CAGGCAAGAGTCCTGGCTGTGGAAAGATACCTAAGGGATCAACAGCTC1776    GlnAlaArgValLeuAlaValGluArgTyrLeuArgAspGlnGlnLeu    580585590    CTGGGGATTTGGGGTTGCTCTGGAAAACTCATTTGCACCACCTCTGTG1824    LeuGlyIleTrpGlyCysSerGlyLysLeuIleCysThrThrSerVal    595600605    CCTTGGAATGCTAGTTGGAGTAATAAATCTCTAGATAAGATTTGGGAT1872    ProTrpAsnAlaSerTrpSerAsnLysSerLeuAspLysIleTrpAsp    610615620    AACATGACCTGGATGGAGTGGGAAAGAGAAATTGAGAATTACACAAGC1920    AsnMetThrTrpMetGluTrpGluArgGluIleGluAsnTyrThrSer    625630635640    TTAATATACACCTTAATTGAAGAATCGCAGAACCAACAAGAAAAGAAT1968    LeuIleTyrThrLeuIleGluGluSerGlnAsnGlnGlnGluLysAsn    645650655    AAACAAGACTTATTGGAATTGGATCAATAGGCAAGTTTGTGGAATTGG2016    LysGlnAspLeuLeuGluLeuAspGln*AlaSerLeuTrpAsnTrp    660665670    TTTAGCATAACAAAATGGCTGTGGTATATAAAAATATTCATAATGATA2064    PheSerIleThrLysTrpLeuTrpTyrIleLysIlePheIleMetIle    675680685    GTTGGAGGCTTGGTAGGTTTAAGAATAGTTTTTGCTGTACTTTCTATA2112    ValGlyGlyLeuValGlyLeuArgIleValPheAlaValLeuSerIle    690695700    GTGAATAGAGTTAGGCAGGGGTACTCACCATTATCATTTCAGACCCGC2160    ValAsnArgValArgGlnGlyTyrSerProLeuSerPheGlnThrArg    705710715720    CTCCCAGCCCCGAGGGGACCCGACAGGCCCAAAGGAATCGAAGAAGAA2208    LeuProAlaProArgGlyProAspArgProLysGlyIleGluGluGlu    725730735    GGTGGAGAGCAAGACAGGGACAGATCCATTCGCTTAGTGGATGGATTC2256    GlyGlyGluGlnAspArgAspArgSerIleArgLeuValAspGlyPhe    740745750    TTAGCACTTATCTGGGACGATCTACGGAGCCTGTGCCTCTTCAGCTAC2304    LeuAlaLeuIleTrpAspAspLeuArgSerLeuCysLeuPheSerTyr    755760765    CACCGCTTGAGAGACTTACTCTTGATTGCAACGAGGATTGTGGAACTT2352    HisArgLeuArgAspLeuLeuLeuIleAlaThrArgIleValGluLeu    770775780    CTGGGACGCAGGGGGTGGGAAGCCCTCAAATATTGGTGGAATCTCCTA2400    LeuGlyArgArgGlyTrpGluAlaLeuLysTyrTrpTrpAsnLeuLeu    785790795800    CAGTATTGGATTCAGGAACTAAAGAATAGTGCTGTTAGCTTGCTTAAT2448    GlnTyrTrpIleGlnGluLeuLysAsnSerAlaValSerLeuLeuAsn    805810815    GTCACAGCCATAGCAGTAGCTGAGGGGACAGATAGGGTTCTAGAAGCA2496    ValThrAlaIleAlaValAlaGluGlyThrAspArgValLeuGluAla    820825830    TTGCAAAGAGCTTATAGAGCTATTCTCCACATACCTACAAGAATAAGA2544    LeuGlnArgAlaTyrArgAlaIleLeuHisIleProThrArgIleArg    835840845    CAAGGCTTGGAAAGGGCTTTGCTATA2570    GlnGlyLeuGluArgAlaLeuLeu    850855    (2) INFORMATION FOR SEQ ID NO:32:    (i) SEQUENCE CHARACTERISTICS:    (A) LENGTH: 665 amino acids    (B) TYPE: amino acid    (C) STRANDEDNESS: single    (D) TOPOLOGY: linear    (xi) SEQUENCE DESCRIPTION: SEQ ID NO:32:    MetArgValLysArgIleArgArgAsnTyrGlnHisLeuTrpLysTrp    151015    GlyThrMetLeuLeuGlyMetLeuMetIleCysSerAlaAlaGlyLys    202530    LeuTrpValThrValTyrTyrGlyValProValTrpLysGluThrThr    354045    ThrThrLeuPheCysAlaSerAspAlaLysAlaTyrAspThrGluIle    505560    HisAsnValTrpAlaThrHisAlaCysValProThrAspProAsnPro    65707580    GlnGluValValLeuGluAsnValThrGluAsnPheAsnMetTrpLys    859095    AsnAsnMetValGluGlnMetHisGluAspIleIleSerLeuTrpAsp    100105110    GlnSerLeuLysProCysValLysLeuThrProLeuCysValThrLeu    115120125    AsnCysThrAspAlaGlyAsnThrThrAsnThrAsnSerSerSerGly    130135140    GluLysLeuGluLysGlyGluIleLysAsnCysSerPheAsnIleThr    145150155160    ThrSerMetArgAspLysMetGlnArgGluThrAlaLeuPheAsnLys    165170175    LeuAspIleValProIleAspAspAspAspArgAsnSerThrArgAsn    180185190    SerThrAsnTyrArgLeuIleSerCysAsnThrSerValIleThrGln    195200205    AlaCysProLysValSerPheGluProIleProIleHisPheCysThr    210215220    ProAlaGlyPheAlaLeuLeuLysCysAsnAsnGluThrPheAsnGly    225230235240    SerGlyProCysLysAsnValSerThrValLeuCysThrHisGlyIle    245250255    ArgProValValSerThrGlnLeuLeuLeuAsnGlySerLeuAlaGly    260265270    GluGluValValIleArgSerGluAsnPheThrAsnAsnAlaLysThr    275280285    IleIleValGlnLeuLysGluProValLysIleAsnCysThrArgPro    290295300    AsnAsnAsnThrArgLysSerIleProIleGlyProGlyArgAlaPhe    305310315320    TyrAlaThrGlyAspIleIleGlyAsnIleArgGlnAlaHisCysAsn    325330335    LeuSerArgThrAspTrpAsnAsnThrLeuArgGlnIleAlaGluLys    340345350    LeuArgLysGlnPheGlyAsnLysThrIleIlePheAsnHisSerSer    355360365    GlyGlyAspProGluIleValMetHisSerPheAsnCysArgGlyGlu    370375380    PhePheTyrCysAspThrThrGlnLeuPheAsnSerThrTrpAsnAla    385390395400    AsnAsnThrGluArgAsnSerThrLysGluAsnSerThrIleThrLeu    405410415    ProCysArgIleLysGlnIleValAsnMetTrpGlnGluValGlyLys    420425430    AlaMetTyrAlaProProIleArgGlyGlnIleArgCysSerSerAsn    435440445    IleThrGlyLeuLeuLeuThrArgAspGlyGlySerSerAsnSerMet    450455460    AsnGluThrPheArgProGlyGlyGlyAspMetArgAspAsnTrpArg    465470475480    SerGluLeuTyrLysTyrLysValValLysIleGluProLeuGlyVal    485490495    AlaProThrLysAlaMetArgArgValValGlnArgGluLysArgAla    500505510    ValGlyIleGlyAlaValPheLeuGlyPheLeuGlyAlaAlaGlySer    515520525    ThrMetGlyAlaAlaSerIleThrLeuThrValGlnAlaArgLeuLeu    530535540    LeuSerGlyIleValGlnGlnGlnAsnAsnLeuLeuArgAlaIleGlu    545550555560    AlaGlnGlnHisLeuLeuGlnLeuThrValTrpGlyIleLysGlnLeu    565570575    GlnAlaArgValLeuAlaValGluArgTyrLeuArgAspGlnGlnLeu    580585590    LeuGlyIleTrpGlyCysSerGlyLysLeuIleCysThrThrSerVal    595600605    ProTrpAsnAlaSerTrpSerAsnLysSerLeuAspLysIleTrpAsp    610615620    AsnMetThrTrpMetGluTrpGluArgGluIleGluAsnTyrThrSer    625630635640    LeuIleTyrThrLeuIleGluGluSerGlnAsnGlnGlnGluLysAsn    645650655    LysGlnAspLeuLeuGluLeuAspGln    660665    (2) INFORMATION FOR SEQ ID NO:33:    (i) SEQUENCE CHARACTERISTICS:    (A) LENGTH: 190 amino acids    (B) TYPE: amino acid    (C) STRANDEDNESS: single    (D) TOPOLOGY: linear    (xi) SEQUENCE DESCRIPTION: SEQ ID NO:33:    AlaSerLeuTrpAsnTrpPheSerIleThrLysTrpLeuTrpTyrIle    151015    LysIlePheIleMetIleValGlyGlyLeuValGlyLeuArgIleVal    202530    PheAlaValLeuSerIleValAsnArgValArgGlnGlyTyrSerPro    354045    LeuSerPheGlnThrArgLeuProAlaProArgGlyProAspArgPro    505560    LysGlyIleGluGluGluGlyGlyGluGlnAspArgAspArgSerIle    65707580    ArgLeuValAspGlyPheLeuAlaLeuIleTrpAspAspLeuArgSer    859095    LeuCysLeuPheSerTyrHisArgLeuArgAspLeuLeuLeuIleAla    100105110    ThrArgIleValGluLeuLeuGlyArgArgGlyTrpGluAlaLeuLys    115120125    TyrTrpTrpAsnLeuLeuGlnTyrTrpIleGlnGluLeuLysAsnSer    130135140    AlaValSerLeuLeuAsnValThrAlaIleAlaValAlaGluGlyThr    145150155160    AspArgValLeuGluAlaLeuGlnArgAlaTyrArgAlaIleLeuHis    165170175    IleProThrArgIleArgGlnGlyLeuGluArgAlaLeuLeu    180185190    __________________________________________________________________________

What is claimed is:
 1. A DNA sequence of less than 5 kilobases encodinggp120 from GNE₈ and having the nucleotide sequence of SEQ ID NO:27.
 2. ADNA sequence of less than 5 kilobases encoding gp120 from GNE₁₆ andhaving a nucleotide sequence selected from SEQ ID NO:29 and SEQ IDNO:31.
 3. An expression construct comprising DNA encoding gp120 selectedfrom the group consisting of GNE₈ -gp120 and GNE₁₆ -gp120 under thetranscriptional control of a heterologous promoter.
 4. The expressionconstruct of claim 3 wherein the promoter is a eukaryotic promoter. 5.The expression construct of claim 4 wherein the DNA encoding gp120 isjoined to a heterologous signal sequence.
 6. An isolated GNE₈ -gp120polypeptide having the amino acid sequence of SEQ ID NO:
 28. 7. Anisolated GNE₁₆ -gp120 polypeptide having an amino acid sequence selectedfrom the group consisting of SEQ ID NO: 30, SEQ ID NO: 32, and SEO IDNO: 33.